Understanding async cancelation

I watched the rust conference video on async cancelation safety and it made me wonder if there is a better way.

It seems to me like the rusty approach to this would be the Undroppable traits that were mentioned combined with all the combinators (like select!) returning the unfinished futures. This would force cancelation unsafe futures to be processed to completion.

Is there a reason select! calls and such don't return the unfinished futures? I realize this would be much more verbose, but it feels like there are too many footguns in async otherwise.

For the undroppable futures they could implement a cancel trait. For example the channel that loses the message when canceled, could instead implement cancel by returning the message back to the caller. This would consume the future without dropping it, and there would be no cancelation issues.

You could wrap an undroppable trait in a Cancelation future that provides the behavior on drop such as discarding the unsent channels message, or caching it for later.

Thoughts? Is there anything I am missing for why this sort of approach isn't taken?

Nothing much, just the fact that one would need a radically different language to implement it, with linear types instead of affine types.

It's not just “an undropable trait”, it's radically different design of the whole language.

It was discussed, of course – but async was needed “yesterday” for the marketing reasons, when it was introduced and Rust got the best implementation that was compatible with the language as it existed back then.

1 Like

Ok, thank you for the response.

Is there any chance a rust 2.0 could rethink these things so that there is a path to a better future?

The current work is in Async-drop ( which is only a bit worse) so you won't have to wait for rust 2.0.

1 Like

There are many gotchas around cancellation, but IMHO it's not a single problem, there isn't a single cause of them.

There are some unexpected behaviors, but that doesn’t automatically mean that the functionality is incorrect. For example, this is a gotcha:

try_join!(shutdown_a, shutdown_b);

When one of the futures returns error, the other will be cancelled, and may not finish shutting down. This may leave some tasks running in the background that shouldn't be.

I don't blame the user here for using anything wrong. The behavior unexpected, and it's easy to overlook it.

But it's not obvious that this is a wrong behavior, because in another context the same behavior is desirable:

let (file_a, file_b) = try_join!(download_a, download_b);

In this example, function will return with an error if any of the two downloads fails. It would be pretty annoying if try_join waited for the other download to finish completely only to discard it immediately before returning with an error.


When a future from shutdown() is cancelled and doesn't abort the work happening elsewhere, it's a footgun and an undesirable behavior. OTOH tokio::spawn() returns a future that also doesn't abort the work happening elsewhere, but in this context this is a feature, not a bug.

Combining both tokio::spawn(shutdown) cancels the cancellation, and works great! But tokio::spawn isn't magic that fixes cancellation, because tokio::spawn(download) would waste bandwidth and memory collecting cancelled results that can't be used.


Even the same API call, running exactly the same code, can have different desired behavior for cancellation:

request("/api/logout").await;

vs

request("/api/long_poll_notifications").await;

When a program shuts down, you may want to give a logout API call a chance to finish[1]. OTOH a request that waits for new message to arrive could sit there indefinitely, and you'll want it to be cancellable without waiting.


Code like timeout(channel.send(msg)) will irreversibly drop unsent message on cancellation (timeout). Whether this is good or not is also context-dependent — it could be storage_queue.send(precious_data) that you want to always deliver the message, but it could be playback.send(beep_sound), and if the speaker is already too busy playing too many beeps, it's fine to drop the message.


So this is why I don't like framing of futures as "cancel safe" and "not cancel safe", because that has a built-in judgement that one way is better than the other, but it's not universal.

APIs like async Drop or Forget are very useful, and necessary to reliably handle cases that must run to completion, but they're not a complete solution, because there are also plenty of cases where it's intended and perfectly fine to abort futures immediately and drop their state. So we're also missing a way to express the intention, and make sure it composes well (or fails loudly) with combinators like try_join! or select!.


  1. I know such solution implemented naively is unreliable anyway, but there are cases where even a best-effort shutdown/cleanup is useful ↩︎

5 Likes

The best way to work with cancellation, in my experience, is using structured concurrency to avoid it at all. For example, I never use select loops, but merge the streams (sometimes unfolded). You can find a lot of info online

Isn't it defined as "the future is cancel safe iff it is logically a no-op to drop it and make a new instance"?

There is a spectrum, yes, all the way up to "if you drop future of .write_all on network socket, further attempts to write there will cause unspecified behavior by messing up the protocol".

It's not a good choice of terminology. Even though the "safety" can be used to mean "it's unharmed by cancellation", and is technically correct in a strict sense, it needlessly uses the word "safety" when Rust already has a quite prominent and important unsafe-related concept of safety, and "cancel safety" isn't related to it at all.

Additionally, safe/unsafe words carry positive/negative association. This is especially significant in Rust's context, where the goal is to minimize (the other kind of) unsafety and make programs as safe as possible.

A future being no-op on cancellation is at best a neutral description of behavior, not a goal like the other "safety" in Rust.

It would be silly to call Box "drop unsafe" and references "drop safe", even though that is technically correct. There are plenty of cases where references are the right choice and dropping must be no-op, but that doesn't make Box deserve being called "drop unsafe".

I could introduce a definition that flips it the other way: a Future that leaves external state behind when cancelled is state leaking. tokio::spawn()'s join handle is state leaking, because it doesn't drop the task when the Future is dropped. But state leaking is a bad term for similar reasons: it's not the same thing as the other use of "leaking" in Rust, and it's also judgemental.

2 Likes