When we say async IO, what part is really "async"?

I'm learning async Rust recently. I find myself not getting the topic since some fundamental part is missing in my knowledge base, so I'm asking here for help. Rust uses an async model that is "zero-overhead" and there is no green thread or whatsoever. Now say we only have one thread, and we "async-write" a size 1MB binary sequence (in memory) to 100 different files on the disk. How does it really work in this case? How does it skip the "wait" part when the program is interacting with the disk? If we can spawn 100 threads this is "easy": we can just let each thread do one write and the OS will take care of the thread scheduling (omitting numerous details that I don't really know), but what happens when we only have one thread? Can someone give an example of how "async" could improve efficiency in this case (focusing on how this one thread interacts with the disk/system API)? How should an async-write function look like (how does it utilize the disk API) and what tricks it uses to gain overall performances when writing to 100 files "simultaneously" with only one thread?


The basic idea behind futures is that the future can go to sleep while waiting for IO. In the case of network IO over e.g. tcp connections, it uses the epoll api to register many ongoing IO operations to one listener, and when notified by epoll, the executor will poll the future again in order to continue the operation. Similarly if you have timers in your async code, the runtime will put the timers in a queue, and notify the corresponding futures when timers are triggered.

There are also cases where you just have to do something that blocks. In this case Tokio provides spawn_blocking, which will run the provided blocking operation on a separate thread pool dedicated for blocking operations. By default Tokio has one thread per CPU core dedicated to polling futures, while having up to 512 threads dedicated for running blocking operations. Since both pools are managed by Tokio, it has the ability to seamlessly migrate threads between these two categories

Unfortunately disk IO is an example of the above — there is no epoll api for disk IO. Therefore Tokio will use spawn_blocking to run normal blocking IO operations, using one of the up to 512 threads for this.

If Tokio is configured to run with the basic single-threaded scheduler, that single thread will simply be migrated between the above categories as necessary.


OK. So down to the earth, the async write is spawning another thread to do the actual work instead of some "smart interactions" with the disk API? Got it.

Another question on the epoll thing around TCP connections. I find myself hard to understand what "epoll" is (my background knowledge of TCP/network is very limited). Tell me if the following understanding (of the basic concept) is correct:

  1. Suppose I send 100 HTTP request asynchronously, each from a different TCP connection, and I want the 100 HTTP response.
  2. I start an executor to block on the 100 responses.
  3. Deep down in the executor, there will be an event loop (roughly, a for-loop over all the futures (of the responses) again and again, until all of them are resolved).
  4. At the end of each pass (over all non-resolved futures). epoll will be called (though the poll interface?) to determine whether to poll the future again in the next pass?
  5. epoll is a way to filter "not-ready" responses. If we don't do it the event loop will be much more expensive (hence a lot slower per one pass).

Another question about epoll: suppose epoll says "something is ready", what does it mean? Is it saying the the TCP connection is ready to receive data or the HTTP response is received (somewhere in memory already)? Who is responsible for "putting the response into main memory"? Is it done by the CPU (reading from a TCP socket buffer or something similar?) or by the network socket directly? I'm not familiar with the hardware protocol here so it's bizarre for me to understand what does the poll and epoll is doing when the response is getting resolved step by step.

It's an OS (specifically Linux) problem. Until the relatively recent introduction of io_uring, Linux async file IO was terrible, borderline unusable. So given time Rust frameworks will be able to support a proper async file IO (similar to what we have for network IO) and not the emulated one which we have right now.


Basically, when the executor polls a future, one of two things will happen:

  1. The future says that it's done.
  2. The future says there's more work to do, but I can't do it right now.

In the second case, the future has the responsibility of notifying the executor when it is ready to continue, and the executor will not poll it until it receives such a notification. In the case of network IO in Tokio, the executor has something called a reactor, which handles the epoll api. The basic idea behind epoll is that you tell the OS you want to be notified when any connection in a list is ready to continue — this can be either because some data has been fully written, or because some data has been received from the peer. Once the reactor receives such a notification from epoll, it will notify the executor that the associated future is now ready to continue work.

As for when the data is moved into main memory, this happens when the future is polled. Epoll only notifies that there is more to do, but the future itself is what actually reads the data.


Thanks! So something like this (still in the context of TCP connection, HTTP request/responses) ?:

  • The future, when polled, if it's not ready, will tell the executor (through the cx: &mut Context) that: I'm not ready yet. Please add this connection xxx to your watching list and attache my name to it so you can poll me later when the connection is able to move forward. (This looks like a simple list-push thing).
  • The reactor will call epoll API, which gives back a list of available connections to move forward. The executor will then figure out which futures should be polled.
  • Still, suppose we have 100 future of responses to resolve "simultaneously", the executor will first loop though all of them to call the poll. After such a (first-pass) loop, its reactor will call epoll (this is blocking, but presumably not very slow) to get the first (or first few) connections that is able to move forward. Then the executor will find the corresponding futures and looping-poll them again (second pass). Then reactor calls epoll again and it goes on and on (third pass, fourth, ...) until all futures are resolved.

Am I getting close?

Some detailed questions about your post:

Question: The description here seems to suggest that the reactor is responding "passively" to the system notification? Meanwhile my understanding is that the reactor calls epoll and wait until the first notification arrives (or, if we are lucky, there could be multiples that are able to move forward when we call epoll so it returns almost immediately with a few action items). Which one is correct?

Question: When you say "fully written", in the case of a HTTP response, where does this written to? The network socket buffer or somewhere in RAM where the network socket has direct memory access? Then later in the poll method the data is transferred to a memory block where our program process have access to?

Hence a poll method for a TCP response future could be something like this? (pseudo code):

poll(self: Pin<&mut Self>, cx: &mut Context) -> Poll<Self::Output> {
    let response = self.try_get_instant_response();
    if let SUCCESS(content) = response {
        return READY(content);
    // Otherwise, no instant response is returned.
    // This should only happen during the first `poll`, as
    // later `poll`'s must be called after an action item is notified.

    // Add to the watchlist to be `epoll`ed after each event loop.
    cx.add_to_watchlist(self.connection, self.future_id);
    // Pending returned. To be `poll`ed later.

You may want to have a look at the mio crate. This is the core of the reactor, and is what handles epoll.

The only thing that the context does is to allow the future to obtain a value of type Waker, which it can later use to wake itself up. Telling the executor that it isn't ready to continue at this time is done by simply returning Poll::Pending from the Futures::poll method.

It does need to talk to the reactor though, but this isn't done through the context — the context is not executor aware, and I'm guessing it uses some sort of thread local for that. The thing it uses as a "future id" is the waker it got from the context.

Note also that I think the reactor runs in a separate thread from the threads that actually poll the futures.

With written, I was referring to outgoing packets that have been fully written to the remote host. As for in-going data, it is first stored in a buffer inside the OS, then the reactor is notified, then the future calls some read function that copies the data from the OS buffer.

Note that a single future might be waiting for many things at the same time using mechanisms such as join or select. In this case a notification from any part of the future will result in everything it is waiting for being polled again. The executor is also allowed to spuriously poll the future without receiving a notification.


So if I understand it correctly. The future gets this Waker handle and somehow passes it to the reactor, and then the reactor later calls the waker to "wake up the future". Under the hood, the call completion of waker syncs with the executor (by some shared state, can be just an atomic bool in the simplest mental model) that this future is ready to be poll'ed. Then the executor will catch this signal in the next event loop and poll the corresponding future.

How the future passes the waker to the reactor is not relevant to the Future framework. The reactor itself is not even part of the Future framework. It's just an implementation to make async TCP connections possible under the Future framework.

Am I getting the big picture this time?

More or less. I'm not sure you've gotten yet that the future will not get polled in a tight loop by the executor. The executor will only poll a future when it first tries to execute the future, and after the future calls Waker::wake(). As I understand it, this makes rust's futures different from a lot of other implementations, and very fast. They do little unnecessary work. For details on the execution model, see the documentation for std::future::Future, and (especially) std::future::Future::poll.

You could ask, “why does async exist?”

One reason is to support servers that process tens of thousands of concurrent requests. By default each thread has a 2MB stack allocation, so if that was implemented using threads tens of gigabytes of memory would be needed, most of which would be unused. By using a much smaller number of threads to process the large number of tasks, lots of memory can be saved.

However, if there are only 100 tasks as in the example above, and the system had 200MB of memory freely available, it may well be more efficient to use threads directly. The OS will unblock threads when data is available using the system scheduler, which might well be faster than having an intermediate executor/reactor layer.


I guess the question is:

how can code / computations be run AND still be listening / waiting for IO events within one single thread?

I'd say that the key idea is that a computer execution is already inherently parallel, between the userland program(s) and the kernel: the kernel can itself receive hardware interrupts / notifications, which can avoid having one thread "actively listening / waiting for something".

So suppose you want to read from multiple physical inputs (network, disk). You can effectively "parallelize" the waits by splitting the read query from the code using the result of that query (either through a combinator / callback à la .and_then() or through an await (yield) point).

This way the runtime can loop through and "execute" all the read queries (without doing the remaining work yet), by "polling" the OS / kernel for these reads, in a non-blocking manner. The kernel will be like "Ok, let me write down those I should notify once I get myself notified by the hardware" and thus return "immediately" to the userland (runtime) having made the query, so that it can go and make the second query, etc.

For the sake of the example, imagine the requests 1 and 2 not being ready but the 3rd one being ready. That 3rd task is woken so the runtime can resume that "task", and go and run the "logic for the result of the request", until reaching another .await point which lets the runtime stop executing that code and enqueue the remaining of that task back. This way, we have effectively "been waiting for the requests 1 and 2 in parallel", and maybe one of those got ready (and thus woken) in the meantime; meaning that now the runtime can go and run the remaining logic for that task, and so on...

So at the user level there is cooperative1concurrency (e.g., no "execution parallelism" needed2, only the waits happen in parallel), and this works thanks to having the kernel handle the IO events in parallel of this user-level execution.

1 It suffices to get one Future badly designed (long computations / loops without a yield point in the middle) to block (the potentially single) one thread of the runtime, starving it out of "execution ressources" for the remaining tasks.

2 This can be observed thanks to a single-threaded runtime being able to spawn (potentially) non-Send Future (e.g., compare a single-threaded Runtime's .spawn bounds with the general Runtime's .spawn() bounds).


Thanks for the info. I understand the big picture now. Under the hood, it's a message queue where the executor keeps polling futures off the queue while each future pushes itself to the queue by calling wake (either directly or passing to another helper such as the socket listener) if more work needs to be done. There is no tight loop that keeps polling every pending futures.

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.