Is it common to get stack overflows on Windows 11?

I'm writing a simple Rust CLI tool that uses the inquire crate to prompt a user to take various ad-hoc actions against cloud resources.

Think common operations like:

  • Create / delete / update an Amazon DynamoDB table
  • Attach an AWS IAM policy to a Role

My main function has an "outer" infinite loop that calls a function to prompt the user for which action to take, and then calls out to various Rust modules that prompt them for details (eg. which resource to delete?) and performs the action.

#[tokio::main()]
async fn main() {
    loop {
        let _ = select_function().await;
    }
}

I have an enum that defines the actions a user can take, when prompted.

#[derive(strum::Display, EnumIter)]
enum AwsOperation {
    #[strum(to_string = "DynamoDB: Create Table")]
    DynamoDBCreateTable,
    #[strum(to_string = "DynamoDB: Update Table")]
    DynamoDBUpdateTable,
    // .... around 20 other operations
}

Then in my select_function() function, I prompt the user for which operation they want to use, and each enum variant triggers a function in a child module.


async fn select_function() -> Result<()> {

    let function_list = AwsOperation::iter().map(|i| i.to_string()).collect::<Vec<_>>();

    let selected_function = inquire::Select::new("Select a function", function_list).prompt()?;
    
    // Find the corresponding enum variant based on the selected string
    let operation = AwsOperation::iter()
        .find(|op| op.to_string() == selected_function)
        .unwrap_or_else(|| {
            let color = colored::Color::TrueColor { r: 220, g: 80, b: 110 };
            println!("{0}", "Invalid function selected".color(color));
            AwsOperation::Exit // Default to Exit if no match found
        });

    match operation {
        AwsOperation::DynamoDBCreateTable => dynamodb::create_table().await,
        AwsOperation::DynamoDBDeleteTable => dynamodb::delete_table().await,
        AwsOperation::AWSSelectCredentialProfile => util::set_aws_profile().await,
        AwsOperation::AWSSelectRegion => util::set_region().await,
        // ... a bunch more operations implemented here ....
        AwsOperation::Exit => std::process::exit(0),
    };
    Ok(())
}

The Problem

This program was recently working pretty well on Windows 11 and MacOS, but after making some recent updates to it, I started getting stack overflow errors for the main thread on Windows 11.

This is really surprising to me, because I am not dealing with very large data sets here. The service-specific AWS SDKs for things like IAM, DynamoDB, S3, CloudFormation, etc. simply allow you to retrieve a list of resources and then call various APIs against those. I'm currently working against an AWS development account that does not have large numbers of resources, so it's not like I'm loading in hundreds or thousands of cloud resources into memory.

During my research, I discovered that Windows 11 has a small thread stack size of 1 MB. I also discovered that tokio allows you to set a thread stack size, but this does NOT affect the main thread, only the child / spawned threads by the tokio runtime. I also discovered that compiling a release build may reduce the artifacts loaded into the stack.

What I've Tried

  • Switching from Rust 1.87.0 stable to 1.88.0-beta.6 - resulted in the same stack overflow problems
  • Build Docker container on Docker Desktop (Linux WSL2 on Windows 11) - this successfully worked around the stack overflow issue (for now)
  • Compile a release build with cargo run --release - this successfully worked around the stack overflows (for now)

To the best of my knowledge, I am not doing any explicit memory allocation on the stack, at least nothing that would dramatically increase the stack size. Each of my child modules simply calls the necessary AWS APIs, stores the resource details / results, and then prompts the user to take an additional action based on the returned data, or goes back to the main action menu.

  • Is this a common issue for Windows applications?
  • Is this a common issue for users of the tokio runtime?
  • What sorts of things should I be checking for, to identify stack overflow root causes?
  • I'm running the application in VSCode debugger, with CodeLLDB extension, but I can't figure out what I should be looking at next?
  • I've had it happen.
  • I've seen a few similar questions here.
  • This is a good example.
  • The last time I had it happen I doubled the stack size with something like this
    and moved on. It has not happened again.

In my experience, Rust programs tend to use more stack space than programs built with other languages.

1 Like

Also anecdotal, but from what I understand nested futures can be quite large. You could try boxing some of them up.

2 Likes

You could just spawn a new thread in main and join it, that way all threads that actively do something will be rust-managed threads so you control their stack size.

2 Likes

Have your research included at least one Google search (something like windows stack size rust) that would immediately bring previous thread on topic?

I don't think of any fundamental changes that have happened since then.

1 Like

It's normal that Rust with opt-level=0 uses stack very inefficiently. Some of the design of futures relies on zero-cost abstractions that are not zero-cost in the debug builds.

There's no point increasing the stack size. No matter how much you keep increasing, the unoptimized inefficient code will soon use it all up anyway. It's better to make the compiler use the stack efficiently instead.

Increase opt level for the debug builds, in Cargo.toml:

[profile.dev]
opt-level = 1 # or 2 if needed

It also helps to change big_async_function().await to Box::pin(big_async_function()).await. This moves state off the stack to the heap. When it's used with less often called functions the cost is negligible, and may even improve performance of async code around it.

2 Likes

Thanks for your response. I am not using any special settings for debug builds right now. I do have opt-level=0 for release builds though, and those work fine. The debug builds are what I'm having a problem with. Good information to have though!

This is a pretty reasonable approach. While I would prefer to use the #[tokio::main] attribute, and leave it at that, it seems I may just need to call spawn() after all, and set the stack size for that thread.

This is a good idea. Most of my data structures are in a Vec<T> though, which I thought was heap allocated. That's partially why I'm surprised that I'm running into stack overflows. :thinking:

Also to @Coding-Badly, thanks for sharing your inputs. Good to know it's something others have run into. Using the linker settings to change the stack size seems a little involved ... maybe I will try that if I can't get things going with simply spawning a new thread and setting the stack size there.

1 Like

These futures you're awaiting aren't in a Vec :slightly_smiling_face:

1 Like

So you don't optimise release builds at all? That doesn't seem right.

Sorry, I have it set to z. I typically follow the recommendations here: GitHub - johnthagen/min-sized-rust: 🦀 How to minimize Rust binary size 📦

Oh, that's good to know!

Hmmm, somewhat related, I wonder if artificially limiting the number of worker threads that tokio spawns would help reduce stack size. Right now, I just let it use the default.

Why would that help? For stack overflow, what matters is the size of the stack for each individual thread, not the total for all threads.

Since you're using z this means you're not using 0, which means the idea that using 0 doesn't increase stack overflows isn't proven. Since debug builds use 0 by default, you should try @kornel's suggestion and use 1 for debug builds.

2 Likes

I thought maybe the extra threads would add some overhead to the main thread? Just a theory.

No, each thread's stack is independent.