How to diagnose a `stack overflow` issue's cause?

ciprian.craciun · May 9, 2018, 12:07pm

As the title says, sometimes the user gets an stack overflow error like the following:

thread 'main' has overflowed its stack
fatal runtime error: stack overflow
Aborted

And, especially when the call "hierarchy" is deeply nested, he has no clue where the issues is "coming from".

Therefore my questions are:

(A) Is there a way to get a hint of the last few functions called, so that the user can start a targeted debugging of the problem? (Currently he has no hint of the probable cause...)

(B) Is there a away for the user code to "catch" this kind of error? Similar to how the user can intercept panics?

Hopefully there is an answer for (A), at least in debug mode.

For (B) one could -- if it makes sense for the particular use-case -- start a separate thread and catch if it aborts from the main thread. (For example my use-case is a Scheme interpreter, thus this technique, although cumbersome, could be used to provide more useful error messages.)

Thanks for the feedback,
Ciprian.

vitalyd · May 9, 2018, 12:38pm

stack overflow is an aborting panic, it doesn't unwind and is not catchable. You'd need the compiler to insert stack probes (sometimes referred to as stack banging) and initiate unwinding if the probe fails; this would necessarily incur a perf penalty.

So the best bet for debugging, for the moment, is to open the core in a debugger (e.g. gdb) and inspect the callstack there. From a user's perspective, this isn't all that easy necessarily, and they may not even know how to debug via the core file.

ciprian.craciun · May 9, 2018, 12:43pm

You’d need the compiler to insert stack probes (sometimes referred to as stack banging) and initiate unwinding if the probe fails; this would necessarily incur a perf penalty.

Could you point me into the direction on how to enable such a feature in Rust? (Or it is not currently available?)

vitalyd · May 9, 2018, 12:49pm

Yes, it's not available AFAIK - I was theorizing about what would need to occur . This is the approach that some of the managed runtimes (e.g. JVM) take.

Right now rust uses guard pages to trap the SO (which is a segfault at the kernel level) so it can at least tell you that it's a stack overflow and not some other segfault.

ciprian.craciun · May 9, 2018, 12:55pm

I understand. So basically there is -- currently and easily accessible -- no way to directly diagnose stack overflow aborts.

However approaching the issue from another angle: is there in Rust a simple way to trace -- printing on stderr or similar -- all the entries (and possibly exits) from user functions?

Perhaps a procedural macro crate that would allow one to "annotate" all functions of interest? Or even better an external tool similar to kcov that would enable this?

dylan.dpc · May 9, 2018, 2:30pm

RUSTBACKTRACE=1

vitalyd · May 9, 2018, 2:39pm

This doesn't do anything for stack overflows because, as mentioned, they abort.

gilescope · May 11, 2018, 9:55pm

The trace macro might help you zero in on what's happening. I agree it's sub-optimal that Rust can't tell you on a debug build what the last function called was. It would be great to at least try something like that out behind a feature.

https://github.com/gsingh93/trace

gilescope · May 16, 2018, 12:34pm

It seems that the compiler now can insert stack probes:

github.com/rust-lang/rust

rustc: Implement stack probes for x86

rust-lang:master ← alexcrichton:probestack

opened 01:38AM - 22 Jun 17 UTC

alexcrichton

+170 -3

This commit implements stack probes on x86/x86_64 using the freshly landed supp…ort upstream in LLVM. The purpose of stack probes here are to guarantee a segfault on stack overflow rather than having a chance of running over the guard page already present on all threads by accident. At this time there's no support for any other architecture because LLVM itself does not have support for other architectures.

Is there an outstanding ticket for tracking this issue as it is not uncommon to overflow a stack while coding?

vitalyd · May 16, 2018, 12:50pm

That's for x86 only and it's meant to circumvent the issue of an allocation overrunning the guard page. I think you'd need to (a) have reliable stack probes on all (tier 1, at least) platforms and (b) wire up unwinding to trigger when the probe fails.

There's https://github.com/rust-lang-nursery/rust-wasm/issues/141, which is fairly recent, but for wasm (and is closed now). I don't know of any existing tracking issue for unwindable stack overflows. Maybe someone else does though ...

gilescope · June 7, 2018, 12:03am

I couldn't find one, so I've raised one here: Great stack overflow error messages · Issue #51405 · rust-lang/rust · GitHub

sarowar · September 4, 2020, 8:00pm

I'll just leave a workaround here for fledglings like me.
I used lldb, gdb should work too, I think.

$ lldb target/debug/app_bin
#enter lldb cmd to start the process
(lldb) process launch

Stack over flow will force the process to exit

Process 2164815 launched: '/path/to/project/target/debug/app_bin' (x86_64)
Process 2164815 stopped
* thread #1, name = 'app_bin', stop reason = signal SIGSEGV: invalid address (fault address: 0x7fffff7fe2f8)
    frame #0: 0x000055566c204a app_bin`__rust_probestack + 23
app_bin`__rust_probestack:
->  0x5555566c204a <+23>: testq  %rsp, 0x8(%rsp)

To see the call stack

(lldb) thread backtrace

Topic		Replies	Views
How do I get a stacktrace out of this?	4	966	July 14, 2023
How to make this program not crash help	9	500	November 12, 2023
Catching a stack overflow help	5	811	September 19, 2022
Is rust guaranteed to detect stack overflows?	13	3280	March 15, 2021
Preventing abort on secondary threads	8	1054	January 12, 2023

How to diagnose a `stack overflow` issue's cause?

Related topics