I'm writing a simple Rust CLI tool that uses the inquire
crate to prompt a user to take various ad-hoc actions against cloud resources.
Think common operations like:
- Create / delete / update an Amazon DynamoDB table
- Attach an AWS IAM policy to a Role
My main
function has an "outer" infinite loop that calls a function to prompt the user for which action to take, and then calls out to various Rust modules that prompt them for details (eg. which resource to delete?) and performs the action.
#[tokio::main()]
async fn main() {
loop {
let _ = select_function().await;
}
}
I have an enum
that defines the actions a user can take, when prompted.
#[derive(strum::Display, EnumIter)]
enum AwsOperation {
#[strum(to_string = "DynamoDB: Create Table")]
DynamoDBCreateTable,
#[strum(to_string = "DynamoDB: Update Table")]
DynamoDBUpdateTable,
// .... around 20 other operations
}
Then in my select_function()
function, I prompt the user for which operation they want to use, and each enum
variant triggers a function in a child module.
async fn select_function() -> Result<()> {
let function_list = AwsOperation::iter().map(|i| i.to_string()).collect::<Vec<_>>();
let selected_function = inquire::Select::new("Select a function", function_list).prompt()?;
// Find the corresponding enum variant based on the selected string
let operation = AwsOperation::iter()
.find(|op| op.to_string() == selected_function)
.unwrap_or_else(|| {
let color = colored::Color::TrueColor { r: 220, g: 80, b: 110 };
println!("{0}", "Invalid function selected".color(color));
AwsOperation::Exit // Default to Exit if no match found
});
match operation {
AwsOperation::DynamoDBCreateTable => dynamodb::create_table().await,
AwsOperation::DynamoDBDeleteTable => dynamodb::delete_table().await,
AwsOperation::AWSSelectCredentialProfile => util::set_aws_profile().await,
AwsOperation::AWSSelectRegion => util::set_region().await,
// ... a bunch more operations implemented here ....
AwsOperation::Exit => std::process::exit(0),
};
Ok(())
}
The Problem
This program was recently working pretty well on Windows 11 and MacOS, but after making some recent updates to it, I started getting stack overflow errors for the main
thread on Windows 11.
This is really surprising to me, because I am not dealing with very large data sets here. The service-specific AWS SDKs for things like IAM, DynamoDB, S3, CloudFormation, etc. simply allow you to retrieve a list of resources and then call various APIs against those. I'm currently working against an AWS development account that does not have large numbers of resources, so it's not like I'm loading in hundreds or thousands of cloud resources into memory.
During my research, I discovered that Windows 11 has a small thread stack size of 1 MB. I also discovered that tokio
allows you to set a thread stack size, but this does NOT affect the main
thread, only the child / spawned threads by the tokio
runtime. I also discovered that compiling a release build may reduce the artifacts loaded into the stack.
What I've Tried
- Switching from Rust
1.87.0
stable to1.88.0-beta.6
- resulted in the same stack overflow problems - Build Docker container on Docker Desktop (Linux WSL2 on Windows 11) - this successfully worked around the stack overflow issue (for now)
- Compile a
release
build withcargo run --release
- this successfully worked around the stack overflows (for now)
To the best of my knowledge, I am not doing any explicit memory allocation on the stack, at least nothing that would dramatically increase the stack size. Each of my child modules simply calls the necessary AWS APIs, stores the resource details / results, and then prompts the user to take an additional action based on the returned data, or goes back to the main action menu.
- Is this a common issue for Windows applications?
- Is this a common issue for users of the tokio runtime?
- What sorts of things should I be checking for, to identify stack overflow root causes?
- I'm running the application in VSCode debugger, with CodeLLDB extension, but I can't figure out what I should be looking at next?