Performance characteristics of deep async/await calls

I am new to Rust, now learning the async/await part. While trying to compare the implementation with other languages, I am wondering the performance characteristics of a specific pattern.

Consider an example with the following pattern:

async fn async_future1() {
  async_future2().await
}

async fn async_future2() {
  async_future3().await
}

...

async fn async_future_n_minus_1() {
  async_future_n().await
}

async fn async_future_n() {
  for i in 0..m {
    io_future().await;
  }
}

According to the executor implementation demonstrated here: Applied: Build an Executor - Asynchronous Programming in Rust. My understanding about the execution flow is like that:

Iteration 1:

future1.poll(...) -> future2.poll(...) -> future3.poll(...), ..., -> future_n.poll(...) -> io_future().poll(...). Eventually, Poll::Pending is returned from io_future().poll(...)

Here future_i is the impl Future<Output = ()> type generated from async_future_i by the compiler.

Iteration 2:

Invoked when the task is woken up. Since only the entry Future: future1 is registered to the Task, the actual invocation again starts with future1.poll(...), which leads to a similar calling chain as Iteration 1. Eventually, Poll::Pending is returned from io_future().poll(...)

Iteration 3:

Similarly, n future_i.poll() are called.

...

Iteration m:

Similarly, n future_i.poll() are called.

Thus, it seems to be that: in this pattern, the time complexity to complete async_future1() is O(n * m). That is, as long as the invocation chain of poll method is deep enough, the performance will degenerate.

I am not sure if I understand the implementation incorrectly, or there is any optimization done by the compiler to avoid the problem, or there is any optimization planned for the future?

Really appreciate Rust experts could provide insights regarding this!

1 Like

The calls to poll are almost certainly heavily inlined, although I guess you still have to check linearly many enum tags as you figure out how far the state machine is. That said, this is not going to be the bottleneck in any realistic application, as checking these tags is going to be very very fast, and realistically call stacks are maybe 50 calls deep at the very most.

I am not sure whether this is the case, a function can be inlined by the compiler when it's small. But I think many generated poll fns can be quite big.

I understand that call stacks won't be quite deep in typical use case. Especially, in synchronous world, a deep call stack will lead to StackOverflow. However, with async/await, many new use cases can be empowered. For instance, there is an example on doing deep recursion with Kotlin coroutine. In Rust, there seems to be a real limit on how deep we can go in the async call stack. Also, from the current interface of the fn poll(), it doesn't seem to be possible to implement a custom executor to optimize the use case of deep call stacks. Basically, from std::task::Context, we can only get the waker to wake the entry Future, not the Future that is deep inside the call stack.

Thoughts?

Regarding inlining, I imagine that it would also be inlined if used only once, which you see rather often when generics come into play. But I don't know for sure.

If you want to recurse very deeply, one option is to do a spawn and to await the join handle. That flattens the call stack below.

But to be honest, deep call stacks is not something I've heard of being an issue to anybody.

Could you provide some details on how to achieve it?

Sure. If you want to await some future without having it execute in your callstack, you can do the following:

use tokio::task::JoinHandle;

async fn i_am_deep_in_callstack() {
    run_foo_with_spawn().await.expect("panic in foo")
}

fn run_foo_with_spawn() -> JoinHandle<()> {
    tokio::spawn(foo())
}

async fn foo() {
    println!("In foo.");
}

playground

In this case, the foo() future becomes its own entry-level future, and is polled directly by the executor. The i_am_deep_in_callstack() future is not polled again until foo() is completely done.

Thanks! I think what works :slight_smile:

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.