thread 'tokio-runtime-worker' has overflowed its stack
I have tracked it down to invoking a method in a large code generated source file. The source file is about 54K lines.
What the code gen does is to build a static map of fn pointers, each fn is a small block of code that accesses data in structs (sometimes nested with Option nodes if the node is not required). The static map has around 2400 elements, so it is not huge. Then there are some simple methods to read from the map, invoke the selected fn.
One odd thing is that the same code runs fine on my Mac, this only happens on Linux.
Since there is no recursion, and this code is sync (not a future) I cannot see anything obvious. I even tried getting the type sizes with a nightly build, the only things that looked sizable were from low-level crates used by our code or that of others.
Could it be just the size of the source file? If so and I broke it into say 20 source files, all accessible as modules from a single source file would that help? I would still need to load each fn pointer to a single static map, I do not see how to get around that.
Clippy has a lint for large futures. If you're not using Box::pin(async).await, you may end up with the entire state of your whole application as one Future on the stack.
I tried various combinations and while we have it working with a bigger stack size, the results are confusing.
Here is what we tried:
Change from `#[tokio::main] to creating our own runtime
Wrap the initial future in Box::pin
On Mac this works fine (it worked before as well)
Set the thread_stack_size manually to smaller sizes, until a overflow happens on the Mac. This does not happen on a Mac until about 450K. When the overflow happens it is much earlier in the flow, so it doesn't really help us narrow it down.
Various stack sizes:
2MB Mac works (the default), Linux crashes with overflow
4MB same thing
8MB both Mac and Linux works
It is odd that they both behave differently (stop at different places), so it is hard to debug. I would have thought Box::pin would solve it, I read an excellent blog post that explains why. If this moves the first main Future to the heap, shouldn't the issue resolve?
We can live with it now just by using a larger stack size (8MB is not unreasonable) but I would like to know better how to find the underlying issue. And the difference between Linux and Mac I suppose can be attributed to differences in compilers?
Any light that can be shed on this will be much appreciated.
This is a commercial application and it is an entire server application, so it is many files and it does a lot. What I can share is this part of the main if helpful:
let rt = tokio::runtime::Builder::new_multi_thread()
.enable_all()
.thread_stack_size(1024 * 8000) // Explicitly set the thread stack size, 8MB seems to be required for Linux
.build()
.expect("Could not start tokio Runtime");
rt.block_on(Box::pin(inner_start_server(addr)))
The inner_start_server isolates all the async fns.
We have yet to try clippy for the large_futures, we will do that later today, we need to learn how to use it first :).
Try Box::pin inside your server code, in multiple places, for smaller call sub-trees. Rust is not very smart with Box, and sometimes makes it by copying data from the stack into the Box, so at this point it's too late to move a too-large stack object onto the heap. Box it while it's not too large yet.