I recently encountered an issue while working with Rust async blocks. To investigate, I wrote a small test case. Here’s a simplified example:
let async_block = async {
let a = [0u8; 8192];
yield_now().await;
println!("{}", a.len());
0
};
#[inline(never)] // it can be #[inline(always)]. It doesn't matter.
async fn async_fn() -> usize {
let a = [0u8; 8192];
yield_now().await;
println!("{}", a.len());
0
}
let size_of_async_block = mem::size_of_val(&async_block);
println!("size_of_async_block: {}", size_of_async_block); // 8194
let size_of_async_fn = mem::size_of_val(&async_fn());
println!("size_of_async_fn: {}", size_of_async_fn); // 8194
let wrapped_async_block = async {
async_block.await
};
let size_of_wrapped_async_block = mem::size_of_val(&wrapped_async_block);
println!("size_of_wrapped_async_block: {}", size_of_wrapped_async_block); // 16389
let wrapped_async_fn = async {
async_fn().await
};
let size_of_wrapped_async_fn = mem::size_of_val(&wrapped_async_fn);
println!("size_of_wrapped_async_fn: {}", size_of_wrapped_async_fn); // 8195
I noticed that wrapping an async block inside another async block significantly increases the memory usage. Specifically, the memory size of the wrapped future doubles. However, when I wrap an async function call instead, the increase is minimal. I observed this behavior in a release build.
Can anyone explain why this doubling of memory usage occurs when wrapping an async block inside another async block?
My guess for the difference is that the closure for the wrapped block includes a copy of the outer block, that is then copied to a seperate inner value, while wrapping the async function only needs the inner block as calling the async fn will create it.
Im not sure what the semantics of black_box imply here, I would assume it shouldn't block optimizing the allocated space before or after the block executes?
Stack usage of regular functions is heavily optimized by LLVM, which can eliminate copies and temporaries even where rustc generates suboptimal code.
The weakness of async is that the "stack" is put into a Future enum by rustc itself, and it doesn't get the same optimization treatment from LLVM, so this process is more prone to accidental inefficiencies like that. Report them as bugs.
You can see what is occupying space in your async block futures by running cargo +nightly rustc -Zprint-type-sizes, which will dump the layouts of every type in your crate, including async block futures. This may give you inspiration for how to modify your code to reduce the size.
Some specific tips:
If you have single large values, consider putting them in Boxes.
In Rust, variables are always dropped at the end of the containing block. But in an async block, this can mean that they are carried across an await point and must live in the future, even if they aren't going to be used otherwise. Therefore, it can be useful to introduce extra blocks or explicit drops to avoid keeping variables alive:
foo().await;
{
let mut buf = [0; 1000];
bar(&mut buf);
} // let buf be dropped here, before the await point
baz().await;
You can also move such code into non-async fns, and try to keep your async blocks/fns short overall so as to avoid accidentally paying async-related costs for code that doesn't need it.
The issue isn't about wrapping an async block in another async block, it's about wrapping a future in another future. Wrapping a function in a future just costs the size of the function, which is zero, and another discriminant.
Interestingly, the playground gives 1 more byte than yours. Your black_box isn't doing anything since println! doesn't return anything.
I think what's happening is that the outer future needs to store the inner future before it awaits and while it awaits, and it doesn't realize those can be in the same place. The closure having 8 extra bytes is because it's being captured by reference, which you could fix by adding move.
Maybe, but there isn't anything to move across it. The only thing that comes after it is the returned zero, so it's not getting in between anything before it.
As a result of this issue, I am unsure of the best course of action. Should I create a new issue, add to the existing one, or simply hope for a fix soon?
This issue does not directly affect me, as I have circumvented it by implementing a workaround using an asynchronous block wrapped in an additional structure. However, I do not want other users to encounter this problem without knowing about it. I only discovered this issue after collecting all the necessary metrics in my application.