Abort asynchronous task

The examples below suggest that async_std::task::JoinHandle::cancel and tokio::task::JoinHandle::abort don't take effect if the tasks the handles reference are caught up in an empty endless loop. So when are these calls guaranteed to take effect and how likely is it that the current behavior will change in the future?

use async_std::task::{block_on, sleep, spawn};

async fn forever() {
    loop {}
}

fn main() {
    block_on(async {
        let handle = spawn(forever());
        sleep(std::time::Duration::from_secs(1)).await;
        println!("{:?}", handle.cancel().await);
    });
}
use tokio::{runtime::Runtime, spawn, time::sleep};

async fn forever() {
    loop {}
}

fn main() {
    let rt = Runtime::new().unwrap();
    rt.block_on(async {
        let handle = spawn(forever());
        sleep(std::time::Duration::from_secs(1)).await;
        handle.abort();
        println!("{:?}", handle.await);
    });
}
1 Like

This is correct. In general, there is no safe way to force a thread of execution to stop what it is doing without the code being executed (forever() in your case) providing ways to be cancelled.

In synchronous code you might do this with some sort of while keep_running() { ...} loop or if should_stop() { return } which checks to see whether the operation has been cancelled and halts accordingly.

Async code is a bit more lenient in that futures are objects which will be continually polled, letting the future execute all the operations up to the next yield point (i.e. await). That means tokio can implement cancellation by just not polling the future any more once you've called JoinHandle::cancel().

This still has the issue that if we never reach a yield point (i.e. we're stuck in a loop { }), tokio will never have a chance to stop polling the future.

I know languages like C# provide mechanisms for this, but their use is strongly discouraged and generally accepted to be a bad idea. There are also unsafe APIs like pthread_cancel() in "asynchronous" mode, but that's more akin to stopping the loop by pulling your CPU out of its socket and can leave your application in a broken state.

Some async runtimes will have things which try to detect when a future has locked up and try to alert the user, but in general there isn't much the runtime can do when they've been given code that misbehaves like this.

2 Likes

You can read more about the phenomenon in the article Async: What is blocking?. In general, the runtime can only abort something that is currently suspended at an .await.

3 Likes

What if the task to be cancelled is running when abort is called but gets suspended later? Does the runtime remember that previous abort?

So cancellation in this context basically means "don't ever run this task again", right? Which makes me wonder why there is no abort for std::thread::JoinHandle. Stopping a thread no matter what is like

but asking the scheduler not to ever run a thread again doesn't seem to be that brutal.

...is impossible unless scheduler provides some way to do this. And, AFAIK, any system scheduler out there doesn't.

2 Likes

Yes, it is.

By forcing the thread to stop without giving it a chance to run any clean-up code you won't free any memory it owns (possibly causing memory leak), release any locks it may be holding (possibly deadlocking your application), or run any destructors (possibly leaving your app in a broken state).

The runtime will have some metadata which is used to keep track of any tasks it has been given. This will contain things like an ID, the Future to run, and a way to track what state it is in - i.e. sleeping (e.g. waiting for IO), pending (waiting to run), running, or cancelled.

Under the hood, calling the cancel() method would do something like runtime.set_task_state(task_id, State::Cancelled). Then when the runtime pops the next piece of work off its queue it'll check the state and notice it was cancelled, so the task will be destroyed.

The implementation varies from runtime to runtime, but that's the general gist.

This is correct. However there is no "polling" with normal OS threads, so you don't have any "don't ever poll this task again" mechanism. OS threads just run a function to completion and when that function exits the thread is killed.

2 Likes

Yes, the runtime will remember your call to abort and immediately abort it next time it gets suspended.

1 Like

Right, so it is more like "don't ever run that task again and run the top-level future's destructor so that the resources it owns get recursively dropped". Am I still missing something?

For instance in the following example consumer.abort() causes r to get dropped which in turn wakes producer up and lets the s.send(i).await operation complete. If r wasn't dropped, producer.await would never return. Right?

use std::time::Duration;
use tokio::{spawn, sync::mpsc::channel, time::sleep};

#[tokio::main]
async fn main() {
    let (s, mut r) = channel(1);

    let producer = spawn(async move {
        let mut i = 1;
        loop {
            if s.send(i).await.is_err() {
                return i;
            }
            i = i + 1;
        }
    });

    let consumer = spawn(async move {
        loop {
            let x = r.recv().await.unwrap();
            sleep(Duration::from_secs(x)).await;
        }
    });

    sleep(Duration::from_millis(100)).await;

    consumer.abort();

    let p = producer.await;
    println!("p: {:?}", p);

    let c = consumer.await;
    println!("c: {:?}", c);
}

So as a thought experiment, would it be possible/useful to equip threads with a cancellation feature provided that the system scheduler could be told not to ever run a specific thread again? Apparently the hard part is not to remove the thread from the scheduler's ready queue (and make sure it never enters it again) but to safely unwind the call stack.

That's correct.

Correct.

Well, possibly, but to do it in a useful way, you have to be specific about where you can be aborted and where you can't. For example, you really don't want a thread to be aborted in the middle of allocating memory — that might leave the allocator's data structures in an invalid state and could break all future attempts to allocate memory.

In async Rust, the .await points serves this purpose.

Writing algorithms that can be aborted at any point is very very difficult, and often impossible.

3 Likes

The challenge is to define what part of which state you would want to survive a cancellation (and continue using) vs. what part of the state you're ready to abandon and reclaim.

There has been prior work in this area: one idea is to define an abstraction for the state you wish to retain (the "kernel") and the state you are ok to abandon (a "process", typically with multiple threads) to use the analogy to OS. Then this "red line" that separates the 2 needs to be defined in all parts of the system (libraries) where you would allow cancellation. Code inside the kernel delays termination and guarantees it will back out (not unwind), whereas code outside the kernel can be terminated at will and if it is, there are no visible effects on the surviving system (ignoring external effects).

Here's the thing. Rust may actually the first language that could pull this off, due to its ownership system. In languages like Java that allow uncontrolled sharing of data structures it's very difficult to identify what points where and what can be safely reclaimed in the case of asynchronous cancellation. In addition, Rust already aims to be able to terminate individual threads via panic! while unwinding their stacks. Moreover, Rust's lack of exceptions can help to safely recover and back out of non-terminable code sections in which asynchronous cancellation must delayed (the "kernel"). The current effort to put Rust into the Linux kernel will definitely help here as the Linux kernel is an environment in which panic! would not be tolerated.

I sense a future research project here.

The system-level API for this is generally something that terminates the process, which ensures system-level safety guarantees (resources are released, files closed, and so on) at the expense of removing control from the application being terminated. On posix-like systems, for example, you can unconditionally terminate a thread by sending SIGKILL to its process with kill(2). You're guaranteed that no part of the process will see inconsistent state afterwards, because no part of the process will run after the signal is delivered. Windows has some slightly finer-grained tools, but they're similarly immediate.

In general, this is the finest level of granularity the OS can provide without support from the application.

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.