When we say async IO, what part is really "async"?

I guess the question is:

how can code / computations be run AND still be listening / waiting for IO events within one single thread?

I'd say that the key idea is that a computer execution is already inherently parallel, between the userland program(s) and the kernel: the kernel can itself receive hardware interrupts / notifications, which can avoid having one thread "actively listening / waiting for something".

So suppose you want to read from multiple physical inputs (network, disk). You can effectively "parallelize" the waits by splitting the read query from the code using the result of that query (either through a combinator / callback à la .and_then() or through an await (yield) point).

This way the runtime can loop through and "execute" all the read queries (without doing the remaining work yet), by "polling" the OS / kernel for these reads, in a non-blocking manner. The kernel will be like "Ok, let me write down those I should notify once I get myself notified by the hardware" and thus return "immediately" to the userland (runtime) having made the query, so that it can go and make the second query, etc.

For the sake of the example, imagine the requests 1 and 2 not being ready but the 3rd one being ready. That 3rd task is woken so the runtime can resume that "task", and go and run the "logic for the result of the request", until reaching another .await point which lets the runtime stop executing that code and enqueue the remaining of that task back. This way, we have effectively "been waiting for the requests 1 and 2 in parallel", and maybe one of those got ready (and thus woken) in the meantime; meaning that now the runtime can go and run the remaining logic for that task, and so on...

So at the user level there is cooperative1concurrency (e.g., no "execution parallelism" needed2, only the waits happen in parallel), and this works thanks to having the kernel handle the IO events in parallel of this user-level execution.


1 It suffices to get one Future badly designed (long computations / loops without a yield point in the middle) to block (the potentially single) one thread of the runtime, starving it out of "execution ressources" for the remaining tasks.

2 This can be observed thanks to a single-threaded runtime being able to spawn (potentially) non-Send Future (e.g., compare a single-threaded Runtime's .spawn bounds with the general Runtime's .spawn() bounds).

4 Likes