Stack overflow in async main function due to excessive stack allocation (*not* from recursion)

I'm new to Rust, so this may be a newbie question.

My test application is currently crashing with a stack overflow exception on Windows. The problem is NOT an issue with recursion, but instead many of the functions have significant amounts of stack space allocated to them. For instance, the "main" function in my application has a 281K stack allocation, and several other functions have 30K stack allocations. Collectively these allocations explode the default Windows 1M stack.

Looking at the assembly, it appears that the allocation comes from a function named something like _ZN16<snip>batch4main28_$u7b$$u7b$closure$u7d$$u7d$17habc06fd5a7dbe532E which appears to be a closure that the compiler generated to enable calling "drop" on exit from the function.

Looking at the metadata for that function, it appears that the function is calling "drop" on the transitive closure for all types referenced in my function.

Is my analysis off base? Does anyone understand where the 280K stack allocation comes from? And does anyone know how I can convince the compiler to use somewhat less stack?

There’s a Clippy lint for excessively large Futures.

Potentially every .await merges the state of the called future with its caller, and this can snowball into having all of the state of your entire program merged into one huge struct, and it will be on the stack.

If that’s the case for you, then wrap calls to big functions in Box::pin(call()).await. This will move that state to the heap, making caller’s future smaller.

8 Likes

Of course, if a Box::pin around the right future does the job for you, then that’s a good and simple solution.

Another (more general) approach to cope with too much stack usage in Rust is to spawn (and use) a thread with a larger custom stack size or use approaches to grow the stack “manually”.[1] If you don't mind the actual amount of memory usage (i.e. it doesn’t seem absurdly large[2] for your use case, and the main problem is really just that it’s using the limited stack memory), and the memory usage is predictable (i.e. turns out always the same; so these approaches of preemptively increasing stack space actually reliably work), then just “increasing the limit” like this could be a perfectly fine solution. All of this is assuming that the stack that’s overflowing is of a thread you control or the main thread, not a thread in the async runtime. (Though those are usually configurable, too.)


  1. stacker is also particularly useful in cases where recursion is involved and problematic ↩︎

  2. since, if it does seem absurdly large, then getting to the bottom of this to understand your program better or perhaps even find bugs seems desirable, of course :slight_smile: ↩︎

3 Likes

That sounds fascinating, and seems highly likely to be related to the problem (more specifically, adding the Box::pin() calls reduced the stack consumption in main() from 280K to 90K, and the overflow went away.

In this case, the test application is calling into an external package which is where almost all the nested futures are coming from. Is there anything that can be done in the external package to reduce the size of the futures used in the package? Essentially a way of collapsing the futures from inside the package so that they don't explode in complexity?

I'm concerned about the how this issue affects the package - if I'm building a package that has a complicated set of asynchronous function calls, is there a way of collapsing the size of the future state without moving all my futures to the heap?

1 Like

The bloat is from holding variables across .await points, and sometimes from async function arguments too.

Check if the library doesn’t create large arrays or structs on the stack, or pass them by value to async functions.

If something is only used between awaits, it can be moved into a narrower scope { let big } to disappear from the Future. I’m not sure if drop(big) will be sufficient.
And Boxing or putting inside a Vec helps too.

5 Likes

ok, that helps. In this case, I believe that the problem is that the library calls multiple async functions from inside individual async functions, likely with a few local variables across the async functions. But since the futures contain the union of all the variable captures, it causes the problem to compound across each level of function call, eventually bubbling up to my test app which only calls 5 library functions - but those 5 library functions are enough to explode the stack because of the depth of the calls.

But all this information helps a great deal to figure out how to best deal with the issue.

1 Like

If you want to understand why specific futures are large, you may be interested in the unstable compiler option -Zprint-type-sizes, which prints a report of the size and fields of every type in your crate — including the autogenerated types for closures and futures. I recently contributed some improvements to the feature so that it prints more details useful for analyzing futures. For example, if I run it on a part of my game project,

cargo +nightly rustc --lib -p all-is-cubes-content -- -Zprint-type-sizes

I get a report including this future:

type: `{async block@all-is-cubes-content/src/template.rs:283:18: 290:10}`: 4920 bytes, alignment: 8 bytes
    discriminant: 1 bytes
    variant `Unresumed`: 4913 bytes
        upvar `.ingredients__parameters`: 32 bytes, offset: 0 bytes, alignment: 8 bytes
        upvar `.progress`: 56 bytes
        padding: 4825 bytes
        upvar `.ingredients__template`: 1 bytes, alignment: 1 bytes
    variant `Suspend0`: 4912 bytes
        upvar `.ingredients__parameters`: 32 bytes, offset: 0 bytes, alignment: 8 bytes
        upvar `.progress`: 56 bytes
        local `.__awaitee`: 4824 bytes, type: {async fn body of template::UniverseTemplate::build<NoTime>()}
        padding: 1 bytes
        upvar `.ingredients__template`: 1 bytes, alignment: 1 bytes
    variant `Returned`: 4913 bytes
        upvar `.ingredients__parameters`: 32 bytes, offset: 0 bytes, alignment: 8 bytes
        upvar `.progress`: 56 bytes
        padding: 4825 bytes
        upvar `.ingredients__template`: 1 bytes, alignment: 1 bytes
    variant `Panicked`: 4913 bytes
        upvar `.ingredients__parameters`: 32 bytes, offset: 0 bytes, alignment: 8 bytes
        upvar `.progress`: 56 bytes
        padding: 4825 bytes
        upvar `.ingredients__template`: 1 bytes, alignment: 1 bytes
    end padding: 6 bytes

Each “variant” of this type is one of the states the future can be in, and each Suspend* variant is an await point; the __awaitee (not a real variable name, but one filled in for the benefit of debuggers and such) is the future that's being awaited. So, in this case, we learn that the async block at line 283 made a future that is 4920 bytes, but most of that, 4825 bytes, is the future of the UniverseTemplate::build() function. Therefore, we should ignore this function as not very substantial, and go take a closer look at build().


For stack frames in general, Clippy has the large_stack_frames lint, but it's very imprecise because it doesn't know when code generation figures out that two stack values can occupy the same space because they are never in use at the same time — it just adds up every variable/temporary in the function. I don't know of a more precise tool for analyzing stack frames.

10 Likes