Why does wrapping an async block inside another async block double memory usage?

I recently encountered an issue while working with Rust async blocks. To investigate, I wrote a small test case. Here’s a simplified example:

let async_block = async {
    let a = [0u8; 8192];
    yield_now().await;
    println!("{}", a.len());
    0
};

#[inline(never)] // it can be #[inline(always)]. It doesn't matter.
async fn async_fn() -> usize {
    let a = [0u8; 8192];
    yield_now().await;
    println!("{}", a.len());
    0
}

let size_of_async_block = mem::size_of_val(&async_block);
println!("size_of_async_block: {}", size_of_async_block); // 8194

let size_of_async_fn = mem::size_of_val(&async_fn());
println!("size_of_async_fn: {}", size_of_async_fn); // 8194

let wrapped_async_block = async {
    async_block.await
};
let size_of_wrapped_async_block = mem::size_of_val(&wrapped_async_block);
println!("size_of_wrapped_async_block: {}", size_of_wrapped_async_block); // 16389

let wrapped_async_fn = async {
    async_fn().await
};
let size_of_wrapped_async_fn = mem::size_of_val(&wrapped_async_fn);
println!("size_of_wrapped_async_fn: {}", size_of_wrapped_async_fn); // 8195

I noticed that wrapping an async block inside another async block significantly increases the memory usage. Specifically, the memory size of the wrapped future doubles. However, when I wrap an async function call instead, the increase is minimal. I observed this behavior in a release build.

Can anyone explain why this doubling of memory usage occurs when wrapping an async block inside another async block?

1 Like

I would probably file an issue with rust-lang/rust.

My guess for the difference is that the closure for the wrapped block includes a copy of the outer block, that is then copied to a seperate inner value, while wrapping the async function only needs the inner block as calling the async fn will create it.

Im not sure what the semantics of black_box imply here, I would assume it shouldn't block optimizing the allocated space before or after the block executes?

I just wanted to be sure that Rust doesn't optimize the function in such a way that a disappears after compilation.

The code that you see was written "on the knee". I doubt that black_box affects anything here.

It's nighttime in my region, but I can fix this code tomorrow.

Yeah, I just was wondering if it was confusing the optimizer. It's quite tricky to test just one optimizer behavior!

Stack usage of regular functions is heavily optimized by LLVM, which can eliminate copies and temporaries even where rustc generates suboptimal code.

The weakness of async is that the "stack" is put into a Future enum by rustc itself, and it doesn't get the same optimization treatment from LLVM, so this process is more prone to accidental inefficiencies like that. Report them as bugs.

1 Like

There are various situations under which async block/fn futures are bigger than they need to be. This issue lists some of them:

I believe your specific issue might be Async fn doubles argument size · Issue #62958 · rust-lang/rust · GitHub, which is hopefully going to be addressed by Relocate upvars to Unresumed state and make coroutine prefix trivial by dingxiangfei2009 · Pull Request #127522 · rust-lang/rust · GitHub or a similar PR.

You can see what is occupying space in your async block futures by running cargo +nightly rustc -Zprint-type-sizes, which will dump the layouts of every type in your crate, including async block futures. This may give you inspiration for how to modify your code to reduce the size.

Some specific tips:

  • If you have single large values, consider putting them in Boxes.

  • In Rust, variables are always dropped at the end of the containing block. But in an async block, this can mean that they are carried across an await point and must live in the future, even if they aren't going to be used otherwise. Therefore, it can be useful to introduce extra blocks or explicit drops to avoid keeping variables alive:

    foo().await;
    {
        let mut buf = [0; 1000];
        bar(&mut buf);
    } // let buf be dropped here, before the await point
    baz().await;
    

    You can also move such code into non-async fns, and try to keep your async blocks/fns short overall so as to avoid accidentally paying async-related costs for code that doesn't need it.

6 Likes

The issue isn't about wrapping an async block in another async block, it's about wrapping a future in another future. Wrapping a function in a future just costs the size of the function, which is zero, and another discriminant.

use std::future::Future;
use core::mem::size_of_val;
use tokio::task::yield_now;

fn main() {
    let closure_async = || async {
        let a = [0u8; 8192];
        yield_now().await;
        println!("{}", a.len());
        0
    };

    async fn async_fn() -> usize {
        let a = [0u8; 8192];
        yield_now().await;
        println!("{}", a.len());
        0
    }

    async fn run_async<F: Future>(f: F) -> F::Output {
        f.await
    }

    eprintln!();
    dbg!(size_of_val(&closure_async));
    dbg!(size_of_val(&async_fn));

    eprintln!();
    dbg!(size_of_val(&closure_async()));
    dbg!(size_of_val(&async_fn()));

    eprintln!();
    dbg!(size_of_val(&async { closure_async().await }));
    dbg!(size_of_val(&async { async_fn().await }));

    eprintln!();
    let closure_async_f = closure_async();
    dbg!(size_of_val(&async { closure_async_f.await }));
    let async_fn_f = async_fn();
    dbg!(size_of_val(&async { async_fn_f.await }));

    eprintln!();
    let closure_async_f = closure_async();
    dbg!(size_of_val(&run_async(closure_async_f)));
    let async_fn_f = async_fn();
    dbg!(size_of_val(&run_async(async_fn_f)));
}

// Output

[src/main.rs:25:5] size_of_val(&closure_async) = 0
[src/main.rs:26:5] size_of_val(&async_fn) = 0

[src/main.rs:29:5] size_of_val(&closure_async()) = 8195
[src/main.rs:30:5] size_of_val(&async_fn()) = 8195

[src/main.rs:33:5] size_of_val(&async { closure_async().await }) = 8208
[src/main.rs:34:5] size_of_val(&async { async_fn().await }) = 8196

[src/main.rs:38:5] size_of_val(&async { closure_async_f.await }) = 16391
[src/main.rs:40:5] size_of_val(&async { async_fn_f.await }) = 16391

[src/main.rs:44:5] size_of_val(&run_async(closure_async_f)) = 16391
[src/main.rs:46:5] size_of_val(&run_async(async_fn_f)) = 16391

Interestingly, the playground gives 1 more byte than yours. Your black_box isn't doing anything since println! doesn't return anything.

I think what's happening is that the outer future needs to store the inner future before it awaits and while it awaits, and it doesn't realize those can be in the same place. The closure having 8 extra bytes is because it's being captured by reference, which you could fix by adding move.

2 Likes

It should also prevent code motion across it, shouldn't it?

Maybe, but there isn't anything to move across it. The only thing that comes after it is the returned zero, so it's not getting in between anything before it.

As a result of this issue, I am unsure of the best course of action. Should I create a new issue, add to the existing one, or simply hope for a fix soon?

This issue does not directly affect me, as I have circumvented it by implementing a workaround using an asynchronous block wrapped in an additional structure. However, I do not want other users to encounter this problem without knowing about it. I only discovered this issue after collecting all the necessary metrics in my application.

If it's not clearly the same problem as an existing issue, you can open a new issue.

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.