Demystifying Async/Await - sort of ;)

Hi there,

after quite a while with Rust and the lovely community here I thought it might be time to give something back based on my own learnings and experience so far. In the past the somehow hardest topic to grasp at the very beginning was how all the async/await puzzles are fitting together and what is driving the async functions to completion.

The result of this is a small Online Book that I've written. I hope everyone who is curious about this topic as I am my find this helpful.

In case of any typo's or other issues with the language (english is not my mother tounge) please feel free to comment and I'll try to fix as soon as possible :slight_smile:

Thanks for your time and reading...

17 Likes

I like how you draw upon the nature of brain and mind to drive your creative abstraction. This is one of my favorite methods. Nature is wise, so we ought to model after her

1 Like

thx for spotting this :slight_smile:

Excellent.

Though I might nitpick the introduction:

One of the main promises of asynchronous, parallel processing is that it can increase overal [overall] performance of your program.

Apart from the spelling error, this seems a bit vague to me.

We have concurrent processing generally described as two or more processes start, run in an interleaved fashion through context switching.

We have parallel processing where two or more processes actually run at the same time on different cores/machines.

Clearly concurrent processing will lower overall performance thanks to all that added context switching. Only parallel processing can add performance, by actually getting more hardware on the job. But it too suffers from the overheads of scheduling and communicating between tasks.

Then we have asynchronous programming which is kind of orthogonal to either of the above and may make use of both.

The promise of asynchronous programming then, in my view, is to claw back the resources and performance lost to having lots of task contexts and the resulting context switching.

As someone nicely put it "Synchronous processing is for when you have a lot of work to do, async is for when you have a lot of waiting to do"

The Rust Async Book notes:

It's important to remember that traditional threaded applications can be quite effective, and that Rust's small memory footprint and predictability mean that you can get far without ever using async . The increased complexity of the asynchronous programming model isn't always worth it, and it's important to consider whether your application would be better served by using a simpler threaded model.

Having said all that....

I think the idea of an operating system on the Raspberry Pi, or pretty much anywhere, that supports non-blocking system calls and async programming throughout is an excellent idea.

Proof of the pudding, as it were, is the Espruino Javascript run-time for micro-controllers, which makes it dead easy to write simple code to respond to all kind of GPIO, timer, sensor, and other inputs all within the event-driven model that is Javascript.

Thanks for your valuable input. Would it be fine with you if I take some parts of your explanation and integrate it into the introduction part ? Especially the concurrent, parallel and async processing thing?
I just wrote this part as this is the "assumption/expectation" I came accross in several posts regarding this topic - questions like "I've done it async bur why is it still not more performant then before ?".

But it may make definitely sense to spent some more context to it and get it out of the vague :slight_smile:

Thx again for reading and your feedback

Sure why not.

Steve Klabnik has a couple of excellent presentations on YouTube introducing async/wait and in at least one of them he makes a point of defining concurrent, parallel, asynchronous in some detail before he starts.

"The Talk You've Been Await-ing for" : https://www.youtube.com/watch?v=NNwK5ZPAJCk
"Rust's Journey to Async/Await" : https://www.youtube.com/watch?v=lJ3NC-R3gSI

After seeing those I started thinking about tinkering with building an executor, looks like you have saved me a lot of head scratching!

I'll be trying to find the time to work through it all.

Great tutorial!! :ok_hand: :ok_hand:

I'd add: a quick mention to reactors, the mechanism, when actual IO is involved, that takes care of calling Waker::wake for us; I'd also tweak a bit the Waker part, since it involves unsafe code:

  • it may be overly difficult for some people,

  • rightfully so, since doing it 100% right is very subtle and hard (you, for instance, like many people around, used the "mem::forget too late in unsafe code" anti-pattern since very potentially unsound :warning:).

So I suggest the following two things:

  1. Just explain that, conceptually, a Waker is basically:

    type Waker = Arc<dyn WakerMethods>;
    // where
    trait WakerMethods {
        fn wake (self: Arc<Self>);
        fn wake_by_ref (self: &'_ Arc<Self>);
    }
    
  2. Then, and only then, and using a drop-down block,

    • mdbook-supported example
      <details><summary>Description of the drop-down block</summary>
      
      [Contents of the drop down block]
      
      </details>
      

    you can specify that &'_ Arc<Self> is actually not a valid trait object receiver type because of the double indirection, and then go and explain that by using Arc::{into,from}_raw, one can collapse that double indirection into a simple one, making the implementation of the trait object possible, but at the cost of hand-rolling everything, including the unsafe definition of a type-erased V(irtual function)Table, where you can also add some links and references, since it's a blog-post / tutorial worthy subject on its own.

Then, within the implementation, go and use (an early) ManuallyDrop instead of (a late) forget like I've done below, and feel free to go into great details explaining everything.

Example
use ::std::{
    mem::ManuallyDrop,
    sync::Arc,
    task::{Waker, RawWaker, RawWakerVTable},
};

/// Helper trait to implement the core waker logic.
///
/// The method can use ownership, or simply borrow itself,
/// so the appropriate one must be overriden.
trait WakerLogic : 'static + Sized {
    fn wake (self: Arc<Self>)
    {
        self.wake_by_ref();
    }

    // &Arc leads to double indirection,
    // which is not supported by trait objects,
    // so we add the following bound for now, and resort to
    // manually hand-rolling the
    // `type Waker = Arc<dyn WakerLogic>` type,
    // by taking advantage of `Arc::{into,from}_raw()` allowing us
    // to "collapse" `&Arc` into a single-indirection pointer with
    // the same semantics
    fn wake_by_ref (self: &'_ Arc<Self>)
    {
        Arc::clone(self).wake();
    }
}

impl<T : WakerLogic> WakerTraitObject for T {}
trait WakerTraitObject : WakerLogic {
    fn into_raw_waker (self: Arc<Self>)
      -> RawWaker
    {
        let ptr: *const () = Arc::into_raw(self).cast();
        let ref vtable: RawWakerVTable = Self::WAKER_METHODS;
        RawWaker::new(ptr, vtable)
    }

    const WAKER_METHODS: RawWakerVTable = RawWakerVTable::new(
        // clone: unsafe fn(*const ()) -> RawWaker,
        {
            unsafe
            fn clone<T : WakerLogic> (ptr: *const ())
              -> RawWaker
            {
                let ptr: *const T = ptr.cast();
                // other option would be to Arc::inc_strong_count and
                // manually recreate the RawWaker again
                let arc_ref: &Arc<T> =
                    &*ManuallyDrop::new(Arc::from_raw(ptr))
                ;
                Arc::clone(arc_ref).into_raw_waker()
            }
            clone::<Self>
        },
        // wake: unsafe fn(*const ()),
        {
            unsafe
            fn wake<T : WakerLogic> (ptr: *const ())
            {
                let ptr: *const T = ptr.cast();
                let arc: Arc<T> = Arc::from_raw(ptr);
                arc.wake();
            }
            wake::<Self>
        },
        // wake_by_ref: unsafe fn(*const ()),
        {
            unsafe
            fn wake_by_ref<T : WakerLogic> (ptr: *const ())
            {
                let ptr: *const T = ptr.cast();
                let arc_ref: &Arc<T> =
                    &*ManuallyDrop::new(Arc::from_raw(ptr))
                ;
                arc_ref.wake_by_ref();
            }
            wake_by_ref::<Self>
        },
        // drop: unsafe fn(*const ()),
        {
            unsafe
            fn drop<T : WakerLogic> (ptr: *const ())
            {
                let ptr: *const T = ptr.cast();
                let arc: Arc<T> = Arc::from_raw(ptr);
                ::core::mem::drop(arc);
            }
            drop::<Self>
        },
    );
    
    fn into_waker (self: Arc<Self>)
      -> Waker
    {
        unsafe {
            Waker::from_raw(self.into_raw_waker())
        }
    }
}

impl WakerLogic for Thought {
    fn wake (self: Arc<Thought>)
    {
        let send_thought_back = self.send_thought_back.clone();
        let thought = self;
        if let Err(closed_channel) = send_thought_back.send(thought) {
            // stuff, e.g., ignore or panic
            panic!("Channel was closed: {}", closed_channel);
        }
    }
}

In that regard, after you mentioned the channel / queue stuff, you forgot to provide the actual implementation of Wake{r,able}::wake, which I think would provide a very nice insight into how the "loop" is tied :slightly_smiling_face:

4 Likes

Thanks for your response,
I’ll check how to best incorporate your suggestions and yes somehow I missed to provide the wake method :man_facepalming:t2:

This seems a little off to me. “Concurrency,” as I understand it, is about the two or more cooperating logical processes, regardless of where they’re running. It’s about things like mutexes, agreement algorithms, atomic operations, and the like.

This is about right, but it’s important to note that parallel processing could involve very little concurrency. Pixel shaders running on agraohics card, for example, are incredibly parallel but don’t have many concurrency concerns: each proces calculates its output without communicating with the others at all.

Something like a Tokio executor running on a single thread is concurrency without parallelism, because it’s scheduling all the work on the same device.

If you have a device that can do some work on its own, like a network card that can transmit a packet from a buffer, it would be nice to let the main processor do some work while the network card is busy sending the packet. Asynchronous programming is the pattern of submitting a job to a device like this and then doing something else until it notifies you that it’s done and ready for another job. This is a very limited kind of parallel processing, where the other processor is often a much less powerful, but specialized, piece of equipment.

Programming languages’ async systems are mostly designed to organize what should happen next when these notifications come in. In Rust, the Future provided by the I/O interface saves the Waker somewhere and wakes it up when the hardware has finished its job, which makes the executor queue the task to be continued.

These terms are very muddled. And I guess ones interpretation is colored by whether one is looking at it from a purely abstract programmer/programming language point of view or from an actual hardware, physical reality, point of view.

If we look at it literally "current" or "currently" means happening now. And "concurrent" is about two or more things happening at the same time. Physics dictates that those two or more things cannot be happening in the same point in space at the same time. (Let's ignore Quantum Mechanics for now)

Clearly a single processor cannot be running two programs at the same time. So there can be no "concurrent" programs. It's either running one or the other at any moment.

And yet, that is what the computer science world has called the time slicing that goes on in multi-tasking, multi-user operating systems, that give the illusion that many programs are running "concurrently". It's an abstraction, it is not physical reality.

Meanwhile "parallel processing" really does emphasize that there really are many things going on at the same time. In different physical places, i.e. different cores or even machines. Those things are happening "concurrently" in everyday language but not in the computer science world.

Well, I tent to disagree based on the definition of concurrency and parallelism here. This feels quite intuitive reading it to me.

In a nutshell: Concurrent processing means: Multiple tasks are in flight at the same time but not executed simultanuesly.

Parallel processing means: Multiple tasks (or parts of it) are executed simultanuesly.

Which means, single core but multi-threaded architectures support concurrent processing, but no parallel processing. Where parallel processing is only possible in multi-core architectures.

1 Like

Exactly, you tend to disagree with me after reading that article, but the article says exactly what I said:

Concurrency means executing multiple tasks at the same time but not necessarily simultaneously.

(Which is an odd way to put it anyway because for normal English speakers "simultaneous" means "at the same time" and yet it states otherwise).

The article confuses things by saying:

...so the CPU will decide to run a task first and then the other task or run half a task and half another task, etc.

Well, no, the CPU decides nothing, the operating system does. Or your language run time if there is no OS.

Obviously from a high level, user point of view, a single processor can appear to be running two or more of what we call programs, or tasks within programs at the same time. They get started, there is period of time that they seem to be running, "concurrently", then they end. Although at a low, hardware, level there is noting "concurrent" about it.

Like I said, it all depends on the level of abstraction one is looking at. Sometimes software engineers have to be reminded of the actual machine they are running on and the laws of Physics.

That “necessarily” is important. Concurrency theory (yes, there really is such a thing) applies whenever you have multiple logical processes that make progress independently of each other. It doesn’t matter if they’re trading off timeslices on a single processor, running on multiple CPU cores in a shared address space, or running in different datacenters half a world apart— perils like deadlocks and data races affect them all.

Like many bits of jargon, computer science’s “concurrency” started out meaning simply “at the same time,” but picked up additional technical details over time as the theory developed. Now it’s a specific field of study related to certain concerns that things happening at the same time have, and you have to use the context to figure out which meaning is intended :confused:.

Well, I'm really not a native speaker in english, however, the linked blog tries to differentiate "same time" and "same time instant". I'm not sure how subtle those differences are and that might be the reason for me struggling with the different definitions :open_mouth: .

My take-away is, that there clearly are subtle different interpretations possible and I'll try to point this out in one way or the other in the introduction chapter - trying not to be more academic then I need to :wink:

The important thing to avoid is implying that something is either parallel or concurrent. Most of the time, parallel programs are also concurrent, and the terms are referring to different aspects of the same thing:

  • Parallelism refers to which pieces of hardware are doing which parts of the work
  • Concurrency refers to how multiple logical processes coordinate with each other

Situations that are one of these but not the other are relatively rare, which is why it’s so hard to tell them apart sometimes. Even those of us in academia slip up reasonably often.

Yes indeed.

I'm no expert but I guess that all goes back to the likes of Tony Hoare and his work on "Communicating Sequential Processes".
Tony Hoare >> Contributions >> CSP.

Oddly enough the definition of "concurrency" on that Stanford page (See What is concurrency?) uses the analogy of doing the washing and scheduling the washer and the drier for optimum throughput. Which is to say it depends on parallel processing in multiple hardware units.

Even the Stanford CS Profs are confused by this :slight_smile:

Exactly.

I'm starting to think every software engineer, especially programming language designer, should take a month or two out and do something in a hardware description language like Verilog or VHDL.

They kind of look like C or Ada respectively, but our notions of sequential/parallel are inverted.

In such HDLs every statement is something that can happen at the same instant as every other. What looks like a sequence of statements is a parallel execution of operations. If you want things to happen sequentially you have to go out of your way to make it so.

I'm not certain that Verilog or VHDL would be sufficient, though it certainly would help convey what really happens "when the rubber hits the road".

Reason for my statement: I once worked on a chip design with a hardware electronics engineer who literally could not comprehend concurrent operation. Needless to say his designs always had multiple timing hazards that others had to remove.

One of the main promises of asynchronous, parallel processing is that it can increase overal performance of your program. Although this does not apply to any generic situation[add: ,] most of us would like to try out whether this could give the desired boost [remove: or not] to your specific challange[replace with: challenge]. However, designing and writing a program[remove: ,] ready to benefit from asynchronous execution is often challenging and sometimes requires knowledge of how the underlining[replace with: underlying] bits and peaces[replace: pieces] fit together.

Some editing to make meaning conveyed clearer and some spelling corrections.

Is there a repo to make pull requests against when I find the occasional typo, or should I post a suggested improvement here?