When I run cargo build and target/debug/myapp I get
thread 'main' has overflowed its stack
fatal runtime error: stack overflow
Aborted (core dumped)
But when run cargo build -r and target/release/myapp it works OK.
I tried to run RUST_MIN_STACK=134217728 target/debug/myapp but that didn't help. I also tried to increase Tokio stack size by running TOKIO_THREAD_STACK_SIZE=134217728 RUST_MIN_STACK=134217728 target/debug/myapp after installing tokio_env but that didn't help either.
I found a location in one function where I call another function and that overflows the stack so I know exactly where the problem is. The problem is that there is nothing wrong with the function -> I need to use the stack...
are you calling Box::new() with very large data? if so, this is a known limitation of rust's move semantic. see e.g. #52827.
there exists workaround for some cases, but it depends on what kind of data you are putting into the Box. for instance, if your type can be constructed incrementally in place, you can use Box::new_uninit(). if the data is a large array, try to create a Vec first and then use Vec::into_boxed_slice() to turn it into an Box. if your type can be "zero-initialized", maybe try bytemuck::zeroed_box()
If you just want to be able to run it in dev mode (reasonable), you can enable optimizations without disabling debug symbols. Add the following to Cargo.toml:
[profile.dev]
opt-level = 2
(You can try 1 and 3 as well. I think the default for profile.release is opt-level = 3, but 2 is a good starting point)
There's no need to suffer with slow and inefficient dev builds if you don't have to.
I didn't want to share it since it is a lot of code in multiple files. But I can describe...
In main function I parse arguments using clap and then I call async function that reads a file and parses content of the file. The problem is the parsing function that is not async, That function uses a lot of stack and returns a large struct with impl Display and I print it into stdout.
I tested that when I remove some parsing (simply by not calling some functions inside the main function), then the program works OK.
(btw I'm not using recursion at all in the parsing functions - just a lot of text processing....)
clippy:large_stack_frames correctly identified the main parsing function: 2639056 bytes on the stack
The problem is that the main parsing function just calls all (10) sub functions that do all the parsing work. I might try to split 10 sub functions into 2 functions and each will call 5 sub functions...
When you call async functions with potentially a lot of state or many nested calls to other async functions, wrap them in Box::pin.
Change foo().await to Box::pin(foo()).await. This will remove the state of foo from the caller, making caller's Future and stack usage smaller. Apply that in several places.
Why not? It still keeps the large unnecessary stack copies in the code. Plus the linker arg may be platform-specific (I don't expect it to work with msvc link.exe)
Another option is to add [profile.dev] opt-level=1 or 2, to get these futures optimized in debug builds too.
If you put Box::pin around less-used code paths (error handling, fallbacks, one-time initialization) it's beneficial overall anyway, because you get smaller future structs, and less cold code in the poll functions on the hot path.