Rust compiler generated async Future code?

llint · October 7, 2021, 11:30am

I've been exploring the async/.await features of Rust, and while reading the Async Book, I had a hard time wrapping my head around figuring out how the actual async code gets executed (by the executor), until I finally realized that there are two categories of Futures in Rust:

Compiler generated state machine code that wraps the original async block/fn into a compiler generated Future type, which internally performs the magic of actually "resuming" the async code
User coded Future types that actually "blocks" at the end of the async invocation chain, examples are: TimerFuture, SocketFuture, etc.

At the Async Book's github issue tracker, I added a proposal to elaborate more details on the compiler generated Future type to help (myself) understanding the actual behavior, but obviously the pseudo code I provided (as well as the content in general) might not be accurate at all, so I'm wondering if anyone (from the Rust async experts) can provide me (and everyone else) the missing details on how async Rust works, and the author of the Rust Async Book might be able to add it to the book eventually!

Any insights would be greatly appreciated!

Excerpt from the issue:

When discussing Rust Futures, pretty much all the information I found online talks about Futures that are coded by a human user, e.g.: TimerFuture, SocketReadFuture, etc.. These Futures are what I call "actual blocking Futures", in that async code flows actually ends up getting blocked at these points.

However, there is another category of Futures that should be generated by the compiler that wraps the async fn/block code into the so-called "state machine code" - the Poll method of the compiler generated Futures should be able to "resume" execution of the async code wrapped inside, whenever appropriate.

I had a hard time wrapping my head around realizing that it is the compiler generated state machine code Future that actually "executes" (or resumes) the async code, which is not clear at all by reading the Async Book.

Furthermore, I think it's worth mentioning that in the chapter "Build an Executor", inside the "run" method, the top level task context is passed to the top level Future, which internally should be passed down to all the other nested Futures encountered - that's why a user defined Future (e.g. TimerFuture) can register its associated "wake" method, which would re-queue the original top level task Future to the executor when it is ready, and the top level task Future's poll method can be invoked again, but internally, the blocked async code at some nested leve would be awoken and continue. Without this information, it was hard for me to reason how exactly the "wake" method wakes up the top level task Future by re-queueing it.

It would also be (extremely) useful to maybe show some example compiler generated state machine pseudo code to help see the hidden-part of the iceberg - especially, how does the compiler generated poll method look like - I even have a guessed version of my own (very very pseudo code, it might even be totally wrong - so show us the correct code! :D)

struct CompilerGeneratedStateMachineFuture {
    started: bool;
    code: StateMachineCode;
    innerFutureResult: Option<T> // the result of the current nested inner future, if any (if the current async fn is blocked on some inner Future.await), if there are many parallel .await inside the same async block, this value will be updated accordingly
}

struct Context {
    chain: Stack<dyn Future>; // should probably be a boxed enum, but for illustration purposes, let's put it this way
        wake: Waker; // the logic of wake is formed by a specific executor, so the top level task could be re-queued to the executor! (though, we don't use it in this pseudo code)
}

// code.resume would internally call inner future.poll for the first time in the nested fashion, until it hits the first blocker at some level; each future is responsible adding itself into the chain 
// code.resume() returns Poll<T> - the same return type of poll() function - since code.resume would return Poll::Pending if an inner most 
// code::resume(Option<T>)

impl Future for CompilerGeneratedStateMachineFuture {
    fn poll(self: Self, cx: &mut Context) -> Poll<T> {
        if !self.started { // 1.
            self.started = true;
            // the future hasn't been polled yet, run the state machine code from the start of the function
            // since the compiler would transform all the .await sites to Future.poll(), so poll would form the nested async future calling chain
            // if at some nested level, one future returns Poll::Pending, we return from here; if it's Poll::Ready, we return from there as well
            // so code.resume would itself return Poll<> result
            // every level should add self (future) to the future calling chain, and when the async function or future completes, it removes itself from the nested future chain
            // who should add the foundational future at the end of the chain, who adds it? could be the compiler generated code adding it, if the future is not ready
            cx.chain.push(self);
            let r = self.code.resume(None); // run the state machine code of the current future from the start - no initial intermediate value
            if r != Poll::Pending { // this pending would definitely be caused by a foundamental future not ready yet!
                cx.chain.pop();
            }
            return r;
        } else if self.innerFutureResult != None { // 2.
            let r = self.innerFutureResult; // moved, so innerFutureResult becomes None
            return self.code.resume(r); // continue the state machine of the current block
        } else { // 3.
            // the chain is not empty without a result, meaning we were blocked at some foundational future at the end of the chain - the end of the nested future chain must be the foundational future that is Pending
            let mut r: Poll<T> = Poll::Ready<()>;
            loop {
                if cx.chain.empty() {
                    return r
                }
                blocked = cx.chain.last(); // don't pop it yet
                blocked.innerFutureResult = Option(r); // convert poll result to option, r would normally be set from the last iteration of the loop
                r = blocked.poll(cx); // this might resolve now; if this is the compiler generated future, then when we reach here, its innerFutureResult must be set to a valid value! so inside this poll, the control would goto 2.
                if r == ready {
                    cx.chain.pop();
                    continue;
                } else {
                    return Poll::Pending
                }

            }
        }
    }
}

alice · October 8, 2021, 7:08am

There is an example of a manual Future impl in the Tokio tutorial here. I wouldn't expect much response on the async book, as nobody is working on it these days.

system · January 6, 2022, 7:09am

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.

Topic		Replies	Views
Understanding Rust async code execution help	6	415	November 1, 2022
Async rust misunderstanding	6	536	November 21, 2022
Async C API help help	9	1096	February 23, 2021
Introduction to async/await (for javascript developers)? help	5	651	August 8, 2019
When handwrite future ,when use await? help	3	361	August 29, 2021

Rust compiler generated async Future code?

Related Topics