Demystifying Async/Await - sort of ;)

Great tutorial!! :ok_hand: :ok_hand:

I'd add: a quick mention to reactors, the mechanism, when actual IO is involved, that takes care of calling Waker::wake for us; I'd also tweak a bit the Waker part, since it involves unsafe code:

  • it may be overly difficult for some people,

  • rightfully so, since doing it 100% right is very subtle and hard (you, for instance, like many people around, used the "mem::forget too late in unsafe code" anti-pattern since very potentially unsound :warning:).

So I suggest the following two things:

  1. Just explain that, conceptually, a Waker is basically:

    type Waker = Arc<dyn WakerMethods>;
    // where
    trait WakerMethods {
        fn wake (self: Arc<Self>);
        fn wake_by_ref (self: &'_ Arc<Self>);
  2. Then, and only then, and using a drop-down block,

    • mdbook-supported example
      <details><summary>Description of the drop-down block</summary>
      [Contents of the drop down block]

    you can specify that &'_ Arc<Self> is actually not a valid trait object receiver type because of the double indirection, and then go and explain that by using Arc::{into,from}_raw, one can collapse that double indirection into a simple one, making the implementation of the trait object possible, but at the cost of hand-rolling everything, including the unsafe definition of a type-erased V(irtual function)Table, where you can also add some links and references, since it's a blog-post / tutorial worthy subject on its own.

Then, within the implementation, go and use (an early) ManuallyDrop instead of (a late) forget like I've done below, and feel free to go into great details explaining everything.

use ::std::{
    task::{Waker, RawWaker, RawWakerVTable},

/// Helper trait to implement the core waker logic.
/// The method can use ownership, or simply borrow itself,
/// so the appropriate one must be overriden.
trait WakerLogic : 'static + Sized {
    fn wake (self: Arc<Self>)

    // &Arc leads to double indirection,
    // which is not supported by trait objects,
    // so we add the following bound for now, and resort to
    // manually hand-rolling the
    // `type Waker = Arc<dyn WakerLogic>` type,
    // by taking advantage of `Arc::{into,from}_raw()` allowing us
    // to "collapse" `&Arc` into a single-indirection pointer with
    // the same semantics
    fn wake_by_ref (self: &'_ Arc<Self>)

impl<T : WakerLogic> WakerTraitObject for T {}
trait WakerTraitObject : WakerLogic {
    fn into_raw_waker (self: Arc<Self>)
      -> RawWaker
        let ptr: *const () = Arc::into_raw(self).cast();
        let ref vtable: RawWakerVTable = Self::WAKER_METHODS;
        RawWaker::new(ptr, vtable)

    const WAKER_METHODS: RawWakerVTable = RawWakerVTable::new(
        // clone: unsafe fn(*const ()) -> RawWaker,
            fn clone<T : WakerLogic> (ptr: *const ())
              -> RawWaker
                let ptr: *const T = ptr.cast();
                // other option would be to Arc::inc_strong_count and
                // manually recreate the RawWaker again
                let arc_ref: &Arc<T> =
        // wake: unsafe fn(*const ()),
            fn wake<T : WakerLogic> (ptr: *const ())
                let ptr: *const T = ptr.cast();
                let arc: Arc<T> = Arc::from_raw(ptr);
        // wake_by_ref: unsafe fn(*const ()),
            fn wake_by_ref<T : WakerLogic> (ptr: *const ())
                let ptr: *const T = ptr.cast();
                let arc_ref: &Arc<T> =
        // drop: unsafe fn(*const ()),
            fn drop<T : WakerLogic> (ptr: *const ())
                let ptr: *const T = ptr.cast();
                let arc: Arc<T> = Arc::from_raw(ptr);
    fn into_waker (self: Arc<Self>)
      -> Waker
        unsafe {

impl WakerLogic for Thought {
    fn wake (self: Arc<Thought>)
        let send_thought_back = self.send_thought_back.clone();
        let thought = self;
        if let Err(closed_channel) = send_thought_back.send(thought) {
            // stuff, e.g., ignore or panic
            panic!("Channel was closed: {}", closed_channel);

In that regard, after you mentioned the channel / queue stuff, you forgot to provide the actual implementation of Wake{r,able}::wake, which I think would provide a very nice insight into how the "loop" is tied :slightly_smiling_face:


Thanks for your response,
I’ll check how to best incorporate your suggestions and yes somehow I missed to provide the wake method :man_facepalming:t2:

This seems a little off to me. “Concurrency,” as I understand it, is about the two or more cooperating logical processes, regardless of where they’re running. It’s about things like mutexes, agreement algorithms, atomic operations, and the like.

This is about right, but it’s important to note that parallel processing could involve very little concurrency. Pixel shaders running on agraohics card, for example, are incredibly parallel but don’t have many concurrency concerns: each proces calculates its output without communicating with the others at all.

Something like a Tokio executor running on a single thread is concurrency without parallelism, because it’s scheduling all the work on the same device.

If you have a device that can do some work on its own, like a network card that can transmit a packet from a buffer, it would be nice to let the main processor do some work while the network card is busy sending the packet. Asynchronous programming is the pattern of submitting a job to a device like this and then doing something else until it notifies you that it’s done and ready for another job. This is a very limited kind of parallel processing, where the other processor is often a much less powerful, but specialized, piece of equipment.

Programming languages’ async systems are mostly designed to organize what should happen next when these notifications come in. In Rust, the Future provided by the I/O interface saves the Waker somewhere and wakes it up when the hardware has finished its job, which makes the executor queue the task to be continued.

These terms are very muddled. And I guess ones interpretation is colored by whether one is looking at it from a purely abstract programmer/programming language point of view or from an actual hardware, physical reality, point of view.

If we look at it literally "current" or "currently" means happening now. And "concurrent" is about two or more things happening at the same time. Physics dictates that those two or more things cannot be happening in the same point in space at the same time. (Let's ignore Quantum Mechanics for now)

Clearly a single processor cannot be running two programs at the same time. So there can be no "concurrent" programs. It's either running one or the other at any moment.

And yet, that is what the computer science world has called the time slicing that goes on in multi-tasking, multi-user operating systems, that give the illusion that many programs are running "concurrently". It's an abstraction, it is not physical reality.

Meanwhile "parallel processing" really does emphasize that there really are many things going on at the same time. In different physical places, i.e. different cores or even machines. Those things are happening "concurrently" in everyday language but not in the computer science world.

Well, I tent to disagree based on the definition of concurrency and parallelism here. This feels quite intuitive reading it to me.

In a nutshell: Concurrent processing means: Multiple tasks are in flight at the same time but not executed simultanuesly.

Parallel processing means: Multiple tasks (or parts of it) are executed simultanuesly.

Which means, single core but multi-threaded architectures support concurrent processing, but no parallel processing. Where parallel processing is only possible in multi-core architectures.

1 Like

Exactly, you tend to disagree with me after reading that article, but the article says exactly what I said:

Concurrency means executing multiple tasks at the same time but not necessarily simultaneously.

(Which is an odd way to put it anyway because for normal English speakers "simultaneous" means "at the same time" and yet it states otherwise).

The article confuses things by saying: the CPU will decide to run a task first and then the other task or run half a task and half another task, etc.

Well, no, the CPU decides nothing, the operating system does. Or your language run time if there is no OS.

Obviously from a high level, user point of view, a single processor can appear to be running two or more of what we call programs, or tasks within programs at the same time. They get started, there is period of time that they seem to be running, "concurrently", then they end. Although at a low, hardware, level there is noting "concurrent" about it.

Like I said, it all depends on the level of abstraction one is looking at. Sometimes software engineers have to be reminded of the actual machine they are running on and the laws of Physics.

That “necessarily” is important. Concurrency theory (yes, there really is such a thing) applies whenever you have multiple logical processes that make progress independently of each other. It doesn’t matter if they’re trading off timeslices on a single processor, running on multiple CPU cores in a shared address space, or running in different datacenters half a world apart— perils like deadlocks and data races affect them all.

Like many bits of jargon, computer science’s “concurrency” started out meaning simply “at the same time,” but picked up additional technical details over time as the theory developed. Now it’s a specific field of study related to certain concerns that things happening at the same time have, and you have to use the context to figure out which meaning is intended :confused:.

Well, I'm really not a native speaker in english, however, the linked blog tries to differentiate "same time" and "same time instant". I'm not sure how subtle those differences are and that might be the reason for me struggling with the different definitions :open_mouth: .

My take-away is, that there clearly are subtle different interpretations possible and I'll try to point this out in one way or the other in the introduction chapter - trying not to be more academic then I need to :wink:

The important thing to avoid is implying that something is either parallel or concurrent. Most of the time, parallel programs are also concurrent, and the terms are referring to different aspects of the same thing:

  • Parallelism refers to which pieces of hardware are doing which parts of the work
  • Concurrency refers to how multiple logical processes coordinate with each other

Situations that are one of these but not the other are relatively rare, which is why it’s so hard to tell them apart sometimes. Even those of us in academia slip up reasonably often.

Yes indeed.

I'm no expert but I guess that all goes back to the likes of Tony Hoare and his work on "Communicating Sequential Processes".
Tony Hoare >> Contributions >> CSP.

Oddly enough the definition of "concurrency" on that Stanford page (See What is concurrency?) uses the analogy of doing the washing and scheduling the washer and the drier for optimum throughput. Which is to say it depends on parallel processing in multiple hardware units.

Even the Stanford CS Profs are confused by this :slight_smile:


I'm starting to think every software engineer, especially programming language designer, should take a month or two out and do something in a hardware description language like Verilog or VHDL.

They kind of look like C or Ada respectively, but our notions of sequential/parallel are inverted.

In such HDLs every statement is something that can happen at the same instant as every other. What looks like a sequence of statements is a parallel execution of operations. If you want things to happen sequentially you have to go out of your way to make it so.

I'm not certain that Verilog or VHDL would be sufficient, though it certainly would help convey what really happens "when the rubber hits the road".

Reason for my statement: I once worked on a chip design with a hardware electronics engineer who literally could not comprehend concurrent operation. Needless to say his designs always had multiple timing hazards that others had to remove.

One of the main promises of asynchronous, parallel processing is that it can increase overal performance of your program. Although this does not apply to any generic situation[add: ,] most of us would like to try out whether this could give the desired boost [remove: or not] to your specific challange[replace with: challenge]. However, designing and writing a program[remove: ,] ready to benefit from asynchronous execution is often challenging and sometimes requires knowledge of how the underlining[replace with: underlying] bits and peaces[replace: pieces] fit together.

Some editing to make meaning conveyed clearer and some spelling corrections.

Is there a repo to make pull requests against when I find the occasional typo, or should I post a suggested improvement here?

well you can create a PR to this repo:


Thanks for the support.

1 Like


I've refactored the waker chapter based on your (@Yandros) proposal. I also find it a bit more clear now :wink:
However, the double Arc indirection you mentioned I do not fully grasp. I thought the RawWaker was required to get the type erased representation of the trait object of the actual Waker/ Wakeable - so I did not touched anything on that double indirection thingy :blush:

Would you consider it to be important to be mentioned as a reason for the VTable and RawWaker things?

1 Like

The idea is that, given that:


why aren't things like so in practice? Why the whole unsafe RawWaker and RawWakerVTable ordeal?

That would be a valid concern from a reader, and not directly answering that question will keep some of that halo of mystery around async / .await that your blog post tries to prevent. So, answering it is both nice, and ... complicated. Hence the suggestion for a drop-down menu. This way, quick readers can keep the mystery, but knowing there is a place they can refer to if and when they wish to elucidate it :slightly_smiling_face:

So, regarding the limitation, in and on itself, is quite basic: the above code causes the following compilation error:

error[E0038]: the trait `WakerMethods` cannot be made into an object
 --> src/
3 | trait WakerMethods {
  |       ------------ this trait cannot be made into an object...
4 |     fn wake (self: Arc<Self>);
5 |     fn wake_by_ref (self: &'_ Arc<Self>);
  |                           -------------
  |                           |
  |                           ...because method `wake_by_ref`'s `self` parameter cannot be dispatched on
  |                           help: consider changing method `wake_by_ref`'s `self` parameter to be `&self`: `&Self`

Which contradicts the "intuition" that this can be implemented. And the reason for that is that Arc is not a completely honest abstraction. A good way of seeing what an Arc is, is that it is a PseudoBox< ArcInner<T> >, where ArcInner is just the T and the atomic counters put together.

So, if we had such hypothetical types, we could define WakerMethods as:

trait WakerMethods {
    fn wake (self: PseudoBox<ArcInner<Self>>);
    fn wake_by_ref (self: &'_ ArcInner<Self>);

and both methods would be valid-to-use-as-virtual-methods / valid-to-use-in-a-trait-object / object-safe / dyn-safe, since both selfs usages above would have had the layout of a single, direct pointer.

But since we don't have such hypothetical types, we end up having to do it manually:

trait RawWakerMethods {
    /// implementation interprets `self` as Arc<Self>
    fn wake (self: *mut Self);

    /// implementation interprets `self` as &'_ ArcInner<Self>
    fn wake_by_ref (self: *const Self);

    /* Plus compiler-generated:
    fn clone (self: *const Self) -> *mut Self { ... }
                    ^^^^^^^^^^^     ^^^^^^^^^
             &'_ ArcInner<Self>     Arc<Self>

    fn drop (self: *mut Self) { ... }

And since the above cannot really be written as a trait either, we end up manually hand-rolly these four virtual methods (RawWakerVTable), then bundled with the raw-ified Arc<Self> as *const ..., yielding the RawWaker.

1 Like

Thanks for the discussion, and the book. I am trying to follow the Rust community and the language in itself, especially everything concurrent. But alas, I have not coded a line of Rust! And to be honest, not read all the words above, either. (But I have censored papers on Rust.)

The purpose of this comment is to tell that there also is "another half of the field", the synchronous. I will refer to matters out of scope for Rust per se - not meaning to harm or discredit any of the above - which seems right in its own context.

But I do have experience with languages that support concurrency. Namely occam 2 in the nineties, go (some, when it arrived) and now I do XC. Their concurrency (and even parallelism for parts of occam 2 and XC) all have a theoretical foundation in Communicating Sequential Processes (CSP) process algebra.

Those languages are basically synchronous. Should one need asynchronism it's built on top. If I keep libraries out of the reasoning, in occam and go I would, in order to add asynchronism, add extra PROC or goroutines, in XC it would be by the help of a so-called interface, with several patterns being possible, to get help from the compiler. But under the hood it's synchronous or state machines or locks or use of different types of tasks (standard, combinable and distributable). For the latter one can also set time constraints by pragmas, it's possible to build multi-core (or rather multi logical core) deterministic processing. From a log: "Pass with 14 unknowns, Num Paths: 8, Slack: 400.0 ns, Required: 1.0 us, Worst: 600.0 ns, Min Core Frequency: 300 MHz." When other tasks are also running.

Some times asynchronousity is necessary (most often close to I/O) and some times it's nice, but often synchronous systems are just as fast and responsive. Some would argue: safer. And reasoning seems to be most often done when having synchronous systems to reason on (like CSP).

Blocking is ok, not pathological if it doesn't let other jobs pathologically wait. So, it often has to do with parallel granularity (not my phrase). Have two threads in a large system and ten independent jobs, then blocking does not seem acceptable. But with 10 threads (task, processes) and 10 jobs reasoning seems easier.

I have over the years blogged quite some about this. Disclaimer: no ads, money. gifts etc. So I take the opportunity to link a few relevant here.

The first is Rick Hickey's lecture on core.async: Clojure core.async at my blog note on clojure core.async.

I have a note where I try to go discuss what "blocking" might imply, see Not so blocking after all.

Then I compare a lecture by C++'s Herb Sutter and go's Rob Pike: search for "Pike & Sutter: Concurrency vs. Concurrency" in my TECH NOTES. (Also commented by Herb Sutter).

André: typo: in About "with it's promises" → "with its promises" ie. (not "it is").


Thanks for putting this together.


This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.