State of `async/await`: unrestrained cooperation is not cooperative

Disclaimer: an arbitrarily deliberate use of strongly worded language coming next. No offense implied or intended. Nor is there any implicit attempt to devalue the work of countless people that lead up to this point. So as to make this discussion as constructive as possible: for every issue / opinion / argument presented the LCD solution will be provided. Reader's discretion advised.


I've been looking into the Rust's approach to async/await for a while now. The deeper into the weeds, the stronger my "something's clearly wrong here" sense has got. From the lack of any comprehensive end-to-end resource on the matter to the mountain of edge cases to consider.

The following is an attempt to piece together the current state of affairs - as of June 2025, one; highlight some of the most glaring (IMPO) shortcomings to the current design and implementation, two; as well as to brainstorm the most sensible / efficient / productive way forward - from now on.

In order from the least to the most significant:

[1]

On the documentation side, having both the async and the await keyword say:

We have written an async book detailing async/await and trade-offs compared to using threads.

When the actual book in question immediately contradicts the "written" part:

NOTE: this guide is currently undergoing a rewrite after a long time without much work. It is work in progress, much is missing, and what exists is a bit rough.

Lands somewhere between "a bit surprising" and "outright embarrassing" for me. Why does it say "written" when it's clearly "unfinished"? Is the documentation wrong/out-of-date? Is there some fully "written" version of the book elsewhere that the new "undergoing" rewrite fails to mention? What should be expected of a newcomer who stumbles upon this bit - other than utter confusion?

Solution/s

[2]

Fragmentation of the ecosystem.

I'm not talking about the choice in between tokio or smol or async_std. I'm talking about the absolutely gargantuan amount of careful plug-A-from-X with use-B-from-Y all while making sure neither X or Y use the set of utils or wrappers or adapters from Z which you might still need D and E from; all the while making sure you don't use any of the F or G or H from it: since they were explicitly reimplemented in X or Y altogether and are no longer compatible.

Example: opening a file, reading it line by line, enumerating each one in the process. [1]

std
fn read_file_sync() -> std::io::Result<()> {
    // #1
    use std::fs::OpenOptions;
    let mut open = OpenOptions::new();
    let file = open.read(true).open("./foo.txt")?;
    // #2
    use std::io::{BufRead, BufReader};
    let lines = BufReader::new(file).lines().enumerate();
    for (i, line) in lines {
        println!("`{i}`: `{}`", line?);
    }
    Ok(())
}
tokio
async fn read_file_async() -> std::io::Result<()> {
    // #1
    use tokio::fs::OpenOptions;
    let mut open = OpenOptions::new();
    let file = open.read(true).open("./foo.txt").await?;
    // #2
    use tokio::io::{BufReader, AsyncBufReadExt}; 
    // ^ but not `use futures::AsyncBufReadExt;`
    let lines = BufReader::new(file).lines().enumerate();
    // #3 - `LinesStream` is in a separate crate,
    // locked behind its own feature
    use tokio_stream::wrappers::LinesStream;
    let stream = LinesStream::new(lines);
    // #4 - for `enumerate()`?
    use futures::StreamExt; 
    // ^ but not `use tokio_stream::StreamExt;`
    let mut lines = stream.enumerate();
    while let Some((i, line)) = lines.next().await {
        // fairly intuitive, isn't it?
        println!("`{i}`: {}", line?);
    }
    Ok(())
}

This isn't about tokio alone. smol has their futures_lite which reinvents the AsyncBufReadExt wheel yet again. async_std has its own set. tokio::pin! is not the same as std::pin:pin! while its tokio::join! seems identical to futures::join! with no parallel for futures::join_all at all.

Poll-based async/await is sufficiently hard as it is. There are more than enough variables to keep track of: given the next point - especially. Complicating things further by reduplicating / reinventing / rehashing the same few methods across a dozen different crates makes no sense.

To be perfectly clear: this isn't about opting in/out of nightly or unstable channels with its AsyncIterator and/or core::stream::Stream. Rather: minimizing the amount of friction and cognitive load people must subject themselves to. Both newcomers and people familiar with the sync side only, alike.

Solution/s
  • define a clear-cut standard of macro_rules! / traits in std::future / std::stream
  • extract / merge / re-export the most widely used ones into a single crate - fully independent from any given asynchronous runtime / executor / approach / philosophy at large

[3]

My memory might be playing a few tricks on me at this point, yet for some reason I still remember rather well a handful of comments with regards to the way this language handled assumptions. Especially the assumptions regarding the ability of the developer behind it to do the right thing.

One phrase in particular stuck out more than usual. It was -

The Pit of Success: in stark contrast to a summit, a peak, or a journey across a desert to find victory through many trials and surprises, we want our customers to simply fall into winning practices by using our platform and frameworks. To the extent that we make it easy to get into trouble we fail. - Falling Into The Pit of Success

From the borrow checker to the exclusive/mutable vs shared/read-only &'s to the Sync and Send markers: everything seemed to have been built around the same few core tenets.

  1. people are not that smart: no matter how strongly they might feel of the contrary
  2. they will make mistakes: regardless of the extent of their knowledge and experience
  3. it is not their fault alone: even the best craftsman can only do so much with a horrible tool

Which is a perfectly reasonable set of assumptions to hold.

Unless we're talking about async/await.

Here you're required to be [1] expert enough to [2] avoid all the mistakes you can possibly make while porting any and all of the blocking code you have into the realm of async/await; and should you fail at such a clearly trivial task - it is most certainly [3] your own fault alone. No exceptions.

If you fail at any of the three, things become even more interesting. You code will compile perfectly fine. It will run perfectly fine. Some of the time. Until a section of your code that never once blocked during a standard #[test] run gets busy processing some abnormally large chunk of data.

Suddenly: things just freeze. Until they don't. Until they do again. Reproducible? Some of the time. Unexpected? Definitely. Infuriating? Always. If only you were [1] a tiny bit smarter you would have realized that it is absolutely critical for you to [2] never leave any section of code, no matter how seemingly transitory at a glance, to chance with regards to its ability to block on a given task / worker / thread. Unfortunately, [3] you're not that smart. async/await was never to blame.

Or was it?

  • why is the underlying impl Future in no way constrained by default?
  • how come the cancellation safety is entirely optional?
  • what is the async alternative to the std::thread::yield_now() call?
    tokio::task::yield_now() only adds an .await point to an existing async block;
    assuming the need to yield from an arbitrary execution point within - what's the way?

Without any semblance of an enforced constraint and/or a preemptive capability of the underlying executor - there can't be an async "pit of success". It is far too easy to skim over a single while / loop / for loop; to forget (or never have found the perfect crate for) a non-blocking alternative to an otherwise perfectly valid piece of code; to lose track of the exact, potentially exponential or worse, number of CPU cycles until the next .await point within each and every Task.

Expecting people to do all of the above and more is not any different from expecting them to keep track of each and every raw pointer to each and every heap allocation across each and every thread they are ever going to work with. We all know how "well" that works for C/C++.

Solution/s? Implicit configurable constraint (in ops/cycles) for each impl Future might be a good start. Mirroring the implicit #repr(Rust) on any enum / union / struct declaration:

// (1) `restrained` by default = auto `.await` every X ops
#[restrained|restrained(ops: 50)|unrestrained|] 
async {
    async_fn_1().await;
    // (2) suspend in place or revert 
    // to the last `Poll::Pending` point
    std::task::yield_now(); 
    // (3) are we talking fibers at this point?
    sync_call_async_spawn();
}

fn sync_call_async_spawn() {
    let mut str = String::new();
    let task_current = std::task::current();
    let str_task_local = std::task::spawn_local(async {
        async_fn_3(&mut str).await;
        task_current.unpark();
    });
    // suspend + yield_now
    std::task::park(); 
}

Alternative option: a whole bunch of lints all over the crate with clippy or similar. Not only the linter would have to scan through the entire codebase and separate potentially (costly) blocking sections from the rest of async code. Twisting people's arms into inspecting all of their projects all over just to stop a linter from screaming them doesn't feel like the most sensible solution out there, however.

Bring your own suggestions and post them up/down below. I'm not particularly attached to any particular spectrum of solutions. Only (ever so) mildly dissatisfied with the current status quo.


  1. What's even more amusing here is I can clearly remember myself having the exact same issue a few years ago. Back when I had no clue or interest in why std::pin::Pin<&mut Self> was the receiving argument of the poll; when reading through the definition of the Future trait itself gave me a headache; and when trying to implement the trait itself from scratch myself seemed just insane. I think I never quite managed to get that exact combination going, too. ↩︎

8 Likes

The issue with that is that the current AsyncRead traits of tokio don't play well with io-uring and cancellation. So it is unclear what a optimal solution would look like.

For your 3rd solution: who would count the ops? Everything outside of the function would kill the embedded usage of async.
You say revert, but how would that work? what if i did a syscall? I don't think this is implementable without requiring it to be pure (which wouldn't help ergonomics at all). When not reverting you would have to use a stack (like go does) or make the futures big enough to hold every possible state.

Also on a bit of a philosophical level, even sync rust doesn't really protect you from writing slow code. I remember a couple of times people go onto the rust reddit to ask why their python was faster and usually it was because they didn't use buffered IO. There is nothing telling you to do that, you just have to know.

2 Likes

Yeah, I mostly agree. Async us really great, I love it and use constantly, but it clearly was an afterthought. You are taking scheduling into your own hands, it's a hard task. https://github.com/rust-lang/rfcs/pull/3782 might help. And on embedded we have async preemption, see Preface - Real-Time Interrupt-driven Concurrency

Many issues can be solved, but cost would be too high. We may have effects and mark functions as "blocking", "terminating" etc, but too small fraction really cares and wants to push it + we are in a swamp of backwards compatibility - imagine we have forbidden blocking functions in async with effects, and some of your deps did not update because its author met a bus did not somehow marked it as non blocking? And making rust automatically determine it would be another hazard.

5 Likes

"You can't please everyone"[1] springs to mind from reading your post.

AFAIK I don't think realtime code has ever been a target rust is chasing. (Your suggestion of restraint hints in that direction.)


  1. My search for the phrase lead me to The miller, his son and the donkey - Wikipedia ↩︎

Depends on the defininion and the nature of the ops in question. My first thought here was about a basic "unpacking" of the source code itself behind a given async block so as to turn it into a linear sequence of pre-LLVM-fed expressions on the Rust side still: manual "injection" of .await points roughly parallel to the tokio::task::yield_now() every X "ops" would become fairly trivial then.

Not sure I got your point about "everything outside" killing the embedded. Mind clarifying a bit?

Fair is fair. In theory, given that the futures themselves can be thought of as a state-machine enum with a handful of FnMut(State) -> State associated with each state point, an interruption of the FnMut itself shouldn't affect the underlying enum itself; only the Pin<&mut Self> of the future.

In practice, some operations aren't easily reversible by any means. There are many ways to handle this: from an Option<RawDropTable> or similar, inner to the State of the future itself, alongside an on_drop<F: Fn() + 'static>(f: F) available through the Context<'_> alongside the current waker(): not too dissimilar from a generic scopeguard; to any other clever trick in the book.

Yes - it would mean that the downstream implementation of Future trait might become (slightly) more complex as in addition to every other non-blocking concern the authors of the Future itself will have to consider the ability of the concurrent state-machine they're engineering to clean up after itself. Yet that is precisely the point. Instead of shifting the blame and responsibility on the end user of the library for not having sufficient insight into every possible extent of time the FnMut(S) -> S can run: the ball goes straight back into the court of the author of the Future and its poll.

This is not about the speed. Slow code is consistently slow. async/await code in its current state might be fast today, slow tomorrow, and frozen outright some time during the next week. In the words of people much smarter than me: there are too many unknown unknown's baked into it. Should you miss on any of them - things might work just fine for the longest time, until they don't.

Not to beat on the dead horse for too long, yet the analogy remains the same. There is a reason for "madness" of the borrow-checker and Send / Sync bounds when dealing with multiple threads. We could have told people that they "just have to know" the correct way to avoid the double free's and the segfaults and the data races and the rest. Heck, that's what C/C++ folk do all the time.

We don't. Because raw pointer tracking is error-prone and one wrong access-after-free can mess up the entire project. Multi-threading is more subtle still and is a nightmare to even reproduce.

Why such a difference in the treatment of the async/await then?

"If everyone just did that" is seldom a sensible long-term strategy. Reasonable defaults ought to be reasonable by default. Expecting people to bend over backwards to annotate things all over might not be that viable, begin with. Backwards compatibility is a requirement, though.

We're not looking for an easy way out, are we?

Appreciated the links, thanks. Will give them a good long read one of these days.

Because async/await, from the beginning to the end is buzzword-compliance first, everything else second. We already have the well-designed and well-tested solution to the problem you outlines – and it even works just fine without any async/await in sight! With mix of a different languages and runtimes (you can use C++, Rust and Java in the same project).

The catch? It needs changes in the very foundation, in the OS kernel.

People don't like to change the foundation thus they invent async/await (just look on the history in Wikipedia).

But if we invent some sort of hack that allows us to not change the foundation… then we automatically place themselves into a tarpit of backward compatibility.

Rust did pretty decent with async/await, all thing considering, but it's very much “an afterthought”, as the others noticed.

P.S. The interesting thing to note is that one may also imagine an entirely different design of async world, built entirely on top of AST, with no threads in sight, used or imagined… and that would also solve most problems. From what I understand embedded users of async are trying to do something like that… and I wish them luck. But the most users of async are, very much, are beholden to the hack that tries to fit itself into the bad design of OSes that exist… and that limits what design may be available.

Rust async works on embedded, because it can run on one thread. As soon as you add anything that has to run on a second thread (like a "hypervisor" that keeps track of the cost of the running future) you lose support for a lot of devices and usecases.

I don't really understood what ideas you had about the reversing of operations, but i believe every solution would either need completely pure functions (to allow rollbacks) or make the futures larger, to store the possible in between states.

Are you running into deadlocks? I have seen some attempts to encode the locking order into the type system to stop them at compile time. I couldn't find the conference talk just now, but i found this crate: ordered_locks - Rust
the approach looks very similar to the one i remember from the talk.
I don't think everyone should be forced to use this, but it shows that if the program becomes too complex there are possible solutions.

I got the impression it did exactly that. And that is why the original "green threads" implementation was thrown out. I see no reason Rust cannot be as real-time as C or C++, given the right platform and/or operating system support.

As a long time developer of real-time embedded systems I have never thought of async (cooperative scheduling) solutions as being a good fit for that. If you want guaranteed response times down to ms or us you need a preemptive scheduler, priorities, interrupts and all that.

3 Likes

I can't help thinking that a lot of Rust users are using async unnecessarily.

My understanding of async has always been that it was a solution for building systems that are required to run hundreds of thousands of tasks, which spend a lot of their time waiting on I/O and have no lengthy compute task to perform. Async being a way to do that without the memory overheads of hundreds of thousand of OS threads or their context switching times.

How many Rust async users are really in that position?

8 Likes

That's obvious. Because async is around 90% of buzzword compliance, maybe 10% of substance. It was born as a band-aid for a limited environments where threads are either inefficient (Windows) or unsupported (JavaScript, Python).

Then it got transplanted into something where it's a very bad fit.

The seductivity of async/await lies with the fact that it's very easy to achieve impressive benchmark results with async/await model… as long as what you are doing is very regidly defined and as long as developers are very disciplined.

But throw in the mix code of poor quality… and the whole things starts falling apart. That's how Novell Netware have become history: it was a king of networks with massive lead in the beginning when networks only provided file sharing and printer sharing.

But when applications started being deployed on top of it… it become a gamble: one misbehaving app may turn your server from the F1 racer into a cripple.

Now we are rediscovering all these lessons all over, again.

Surprisingly enough no. It was always a solution for the poor performance of threads, in some OSes or total unavailability of threads, in some languages.

Of course it was sold as a solution for much wider class of problems… but that's not where it was invented and developed.

Not may. In Rust, specifically, async/await is 90% of buzzword-compliance, 10% of everything else.

1 Like

I don't think async is only useful for cases like that.
I currently write a desktop app and i use async to do background work. It's really easy to queue tasks into. Writing the background task is really simple even when they need to call sleep or do io, without worrying about blocking the whole thing.
Sure i could spawn a thread everytime i want to spawn a task, but that would a lot less efficient, so i really like that i can do this easily.

The Swift language is also going through this trying to add concurrency safety. Maybe both languages could investigate each other's solutions and see if there's a way to use those learnings?

1 Like

I don't think this will achieve the stated goal as these "ops" can take arbitrarily long time (think e.g. if they are function calls).

You also need to be careful with inlining, as this needs to be done before any sync/blocking function is inlined in this async function. If you mistakingly introduce a .await point inside a sync/blocking function then you risk unsoundness, since sync/blocking functions assume their stack won't be leaked (this is what allows std::thread::scope to be sound for example!).

3 Likes

Async is, IMO, rust's biggest drawback. You explained it well. It adds complexity that's not obvious at first, and fragments the library ecosystem due to compatibility problems. It provincializes code, when I think we should be moving towards more general, re-usable components. I think the lines drawn between both Async and non-Async code, and between the Async implementations is an obstacle to writing universal components and tools.

3 Likes

External supervision would be a massive overkill, go completely against the grain of the current design, and introduce a whole bunch of new problems no one really needs.

The chief point here is in the need of some limit/restraint/constraint on the Future itself. Whether it's an external or internal one doesn't matter much: what matters is removing the need to get it "perfect" the first time around or have to endlessly scour through the whole codebase for a rogue piece of blocking logic once things come to a scratching halt years later: as was the case here.

I have a rather strong inkling there's another, a bit more efficient way, in between those two. Will play around with the idea one of these days. Could be some misplaced optimism, after all.

Another point I didn't take the time to flush out: RTC is far from being the only kind of environment where interrupts and/or priorities and/or preemption make sense. Running a single misfortunate for x in oops_blocking()? shouldn't compromise the execution of the entire program at large.

My own has been roughly the same for a while. The only outstanding problem with that definition is the implicit requirement on every "lengthy compute task" as well as people's ability to keep track of every individual length at large. Can it be done? Certainly. Will everybody do it the right way before they get a chance to realize they might have made a blunder? Extremely unlikely.

IMHO - it's not for us to judge. The use case of RefCell in an Rec and its thread-safe Mutex in an Arc is fairly limited, as well. As long as they're an official part of the language: they must work as intended, regardless. Ideally: with all of the edge cases and accidental misuse accounted for.

Because it's exceptional, rather than normal.

Consider the following series of operations:

  • Write pending changes to file A.
  • Flush file A.
  • Modify file B to implement the pending changes.
  • Flush file B.

This is a classic write-ahead log pattern, and while it's somewhat rare in application-level code, it's something Rust supports. How do you make this task cancellation-safe?

In synchronous Rust, the answer is, you don't. If the program crashes or is terminated in the middle, then the next time the program starts, it recovers by re-applying pending changes from file A to file B before resuming normal operation: recovery is an application-level property, not a library-level one.

Async Rust, today, gives you the same answer. Even if each of those steps is, itself, cancellation-safe, the whole sequence is not. If you have a future representing this WAL-then-data update, and you drop it in the middle, then it will be cancelled, but cancellation-safety doesn't and can't come from Rust. It has to come from compensating logic in the application that understands that this operation could be interrupted and may need to be recovered or disposed of afterwards.

More generally, there is no known general construction whereby sequences of arbitrary cancellation-safe operations can be composed in cancellation-safe ways, so there's really no known way Rust, or a Rust-based library, can give developers cancellation-safety out of the box without developer-side effort.

7 Likes

I might disagree with that characterization, though I do see where you're coming from.

Sequential code is easier to read and frequently easier to verify, up to most casual levels of testing, than the alternatives are. If the number of concurrent tasks is guaranteed to be one, then you can write a system using sequential operations without touching async, and it'll work great.

But if the number of concurrent tasks might be two, then you either need to rewrite your program as a state machine yourself, have the language or a library produce the state machine for you (as async does), switch to threads, or force the problem back down to purely sequential operation.

Threads are fine, but so is having the language produce a state machine from your sequential code. Neither is inherently superior in all cases. Having spent a fair bit of time in thread-per-request ecosystems such as Java and Python, I really appreciate having the option to not do that.

3 Likes

Calling it an afterthought is insulting to everyone who spent years designing, prototyping, and testing it.

It's one of the most painstakingly designed parts of Rust. It has some rough compromises, but it's rude to imply it was some "afterthought" or chasing a fad.

Async hasn't been added before 1.0, because Rust first (unsuccessfully) tried having green threads. Futures were pursued only later after it was clear that Rust is getting adoption in networking, and async was added only after extensive testing of library-only solutions have proven that built-in support is required for usability and performance. It's a result of a long careful exploration, prototyping, and building on top of real-world solutions used in the ecosystem.

11 Likes