Reverse stack trace order

This whole topic is a copy of https://github.com/rust-lang/rust/issues/38873. I think this may be a better place for what could be a very heated debate.

A big improvement to the speed at which I can find the origin of a panic would be to reverse the order of the stack trace. At the moment the last line of the stack trace is always main. This means the stack trace is written deep->shallow, or in reverse(!) chronological order:

1:        0x10977219c - function_that_caused_the_panic
2:        0x1097738ee - foo
3:        0x109773590 - bar
4:        0x109773cf6 - baz
5:        0x109773b94 - badger
6:        0x109757669 - main

What you are generally looking for is the place that caused the panic (e.g. the line with an assert! or panic! on it), and so I think it would make sense to have that line last so that it is easy to find. This is particularly important when the call stack is deep and it fills your terminal. Here is a more in-depth argument for this.

So my suggestion is: reverse the stack trace to print the stack frames in chronological order, shallow->deep, with main on the top and the stack frame with the panic at the end:

6:        0x109757669 - main
5:        0x109773b94 - badger
4:        0x109773cf6 - baz
3:        0x109773590 - bar
2:        0x1097738ee - foo
1:        0x10977219c - function_that_caused_the_panic

Until I read the blog linked to above I didn't realize that most languages – including Rust – was doing it "wrong", and that there was a better way. In C++ I switched the order I print stack traces about a year ago, and since then it has saved me a lot of scrolling.

4 Likes

I don't see any reasoning why a reverse stack trace would be more useful: I would say that more often than not, it's actually the middle of the stack that is more useful to diagnosing the error. Many times, some value has been set to an incorrect value, hasn't been initialized or some other issue has occured before the actual error itself, that is the root cause of the error.

Another reason why "reverse" order might be preferrable, is that it matches programmers' mental model of what a stack looks like. Usually, when people are taught about stacks, the mental picture is that of a stack of "things" that grows upwards, then things are taken off, put back on, etc.

Finally, as you mention, most languages do it this way. Turning it around, without a very good reason, would put extra barrier to entry, one that doesn't really provide much benefit.

Just to put something to the article about "usability" of the stack trace and watching that there is a ongoing PR to "clean" a little the output, I will add this

Taking the original example

thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: ()', src/libcore/result.rs:837
stack backtrace:
   1:     0x56381e42288e - std::sys::imp::backtrace::tracing::imp::write::hfc2edb670e5eda97
   2:     0x56381e424b91 - std::panicking::default_hook::{{closure}}::hc66a547fab0b4d38
   3:     0x56381e424776 - std::panicking::default_hook::h7fba1b2f69474bdc
   4:     0x56381e424f78 - std::panicking::rust_panic_with_hook::h5d3597668c9f0035
   5:     0x56381e424e12 - std::panicking::begin_panic::hfaa38fdbc7d103f6
   6:     0x56381e424d50 - std::panicking::begin_panic_fmt::h69e0397ed8fc5362
   7:     0x56381e424cd1 - rust_begin_unwind
   8:     0x56381e46db2f - core::panicking::panic_fmt::h2ba266031787cf7b
   9:     0x56381e41cc55 - core::result::unwrap_failed::h0a78b8b4914ea99c
                               at /home/yamakaky/dev/rust/rust/src/libcore/macros.rs:29
  10:     0x56381e41c99b - <core::result::Result<T, E>>::unwrap::h2f7c43c0e2d4b7cd
                               at /home/yamakaky/dev/rust/rust/src/libcore/result.rs:737
  11:     0x56381e41ccd8 - t::main::h86d9b74d94eb7fff
                               at /tmp/pote/t.rs:2
  12:     0x56381e42f0f6 - __rust_maybe_catch_panic
  13:     0x56381e4255a8 - std::rt::lang_start::ha33d35ddc5bfd7d2
  14:     0x56381e41cd12 - main
  15:     0x7f6dc966c290 - __libc_start_main
  16:     0x56381e41c7f9 - _start

Can be put some like

thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: ()', src/libcore/result.rs:837
stack backtrace:
1: std::sys::imp::backtrace::tracing::imp::write <- std::panicking::default_hook::{{closure}}
3: std::panicking::default_hook <- std::panicking::rust_panic_with_hook <- std::panicking::begin_panic
6: std::panicking::begin_panic_fmt <- rust_begin_unwind
8: core::panicking::panic_fmt <- core::result::unwrap_failed
                               at /home/yamakaky/dev/rust/rust/src/libcore/macros.rs:29
10: <core::result::Result<T, E>>::unwrap
                               at /home/yamakaky/dev/rust/rust/src/libcore/result.rs:737
11: t::main::h86d9b74d94eb7fff at /tmp/pote/t.rs:2 <-  __rust_maybe_catch_panic
13: std::rt::lang_start::ha33d35ddc5bfd7d2 <- main <- __libc_start_main <- _start

Even more it could be only the list of the functions and on each change (from file or library, print the switch)

thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: ()', src/libcore/result.rs:837
stack backtrace:
1: std::sys::imp::backtrace::tracing::imp::write, std::panicking::default_hook::{{closure}},
3: default_hook, rust_panic_with_hook, begin_panic, begin_panic_fmt, rust_begin_unwind,
8: core::panicking::panic_fmt, core::result::unwrap_failed at /home/yamakaky/dev/rust/rust/src/libcore/macros.rs:29
10: <core::result::Result<T, E>>::unwrap at /home/yamakaky/dev/rust/rust/src/libcore/result.rs:737
11: t::main at /tmp/pote/t.rs:2, __rust_maybe_catch_panic, std::rt::lang_start, main, __libc_start_main,  _start

Also I have an extra question... backtraces are for know where... but what about the data?? why it is never printed? it would be nice i the callback can be unrolled with the passed arguments to each funtion (well that can be done in a debugger, thought maybe in one or two places coul clear it up a little more, I mean normally if you are working on terminal you see the stack trace, the next step would be to maybe put a log there to print the values ;).

How can you be sure that the data will not leak across a trust boundary?

The stack trace shows where the execution would have ended up if the panic had not occurred. It does not show history, but the future. (The difference matters with tail calls.) As such, it is actually in chronological order.

3 Likes

Every debugger I've ever used prints the most recent stack frame first. I see no reason to deviate from this in Rust.

3 Likes

Also I have an extra question... backtraces are for know where... but what about the data?? why it is never printed?

I love this feature. In my C++ logging library this is an opt-in feature called ERROR_CONTEXT: GitHub - emilk/loguru: A lightweight C++ logging library. Being opt-in it does not leak unwanted (sensitive) data. It is also extremely cheap (a few cycles).

I would love to have something like this in std Rust.

+1 on Reverse ( Lets call it "correct" ) order.

In my IDE a compile comes down to:

  • Code
  • Shift+F10 ( Run / Compile )
  • Error in Terminal output
  • Grab mouse
  • Scroll and check error
  • Put focus back to Editor
  • Fix Error

In a reverse order, this becomes:

  • Code
  • Shift+F10 ( Run / Compile )
  • Error in Terminal output
  • Fix Error

It cuts a large part of a bad habit. Its funny how we get used to doing something ( incorrectly ), that we never question the efficiency of it. It boils down to the issue that we are used to reading on normal pages from top to bottom. But when dealing with IDEs where part of your screen is occupied with a terminal / editor etc, you fo not dedicate a full page to its output.

At minimum the reserve order needs to be a option. For any compiler error / warning.or stack trace.

2 Likes

Related features a http://unroller.rubyforge.org/ it prints all the trace or The Mono log profiler | Mono which can be used to something like that.

But here talking about stack traces, it would be nice if the stack trace could if opted in print the passed data, so you dont need later to "put logs traces to see what is happening" if oyu dont have a real debugger at hand.

For reference, Python prints its tracebacks in the order suggested here, with the most recent call last:

>>> def boom():
...   1 / 0
...
>>> def main():
...   boom()
...
>>> main()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 2, in main
  File "<stdin>", line 2, in boom
ZeroDivisionError: integer division or modulo by zero

I've always felt this made it easy to see the relevant details quickly: the function that triggered the exception is printed right above the exception itself and this is generally right above your terminal cursor.

5 Likes

I'd agree with reversing the stack order, but mainly because it seems more natural to read the stack trace downwards ("main calls x calls y") since that's the order of execution.

Except as @fweimer already pointed out, that's not what a native stack trace represents. It represents "y returns to x returns to main".

Well yes, but I'm not sure the distinction is meaningful until Rust gets tail calls. From the point of view of someone trying to debug, I want to know how it got to the current state which feels more natural to me by scanning down (only my opinion of course).

Tail calls aside, writing code where returns and calls are not 1-1 seems a bit niche? So it becomes a question of user-friendliness, which is fairly subjective.

Rust (or LLVM) can do TCO already -- it's just not guaranteed, not something you can rely on for limiting stack depth. But you may already see the effect of tail calls in your backtraces.

I'd also love to have the ability to know how my code got to a certain point, but this is just not something that's recorded (unless you're using very new processor features). TCO and inlining can easily mess up any idea of the past and only the future is accurately represented.

I feel this discussion is getting slightly out of hand. I think the most important thing is UX - not philosophy or points of views. The 2017 roadmap is all about * productivity* with an explicit mention of "a pleasant edit-compile-debug cycle". A core part of this for many (most?), is to quickly diagnose a panic by viewing terminal output. The question is then what leads to faster diagnosing: having the deepest stack frame close to the terminal cursor, or having main there. I think there are some advantages to both:

  • Current system (main at the bottom): scanning upwards gives us a quick idea of "what lead to the error"
  • Proposed system (panic at the bottom): a quick scan tells you the function that had the problem.

I find arguments from philosophy (this system "feels more natural") or from inertia ("this is how it's always been done") to be far less powerful than arguments from experience!

Now: I have used both systems, and in my experience the proposed system is vastly superior. Has anybody else tried both systems and find the opposite to be true?

3 Likes

My impression is that both myself and @jethrogb are discussing this based on experience. My "seems more natural" was meant in the sense of "I find this more natural" and my interpretation of what @jethrogb says is that they appreciate the 1-1 mapping between what is displayed and what a stack trace actually means (which I do sympathise with, even if I don't value it so highly) - both of these perspectives are rooted in having used the feature.

I'd also note that personal experience is very much a point of view!

I favour the current stack order — main() at the bottom, function that panics at the top. My rationale:

  • What is more "natural":

    If I had never seen a stack trace, I don't know what I would have chosen. Out of the languages that I have worked with (C, C++, Ruby, Java, Python), the first time I was confused was when I hit an exception with Python, because I had already been trained to interpret the stack trace bottom up as the "call order", from the previous languages.

    Unless there is a good reason for the change, I would stick with consistency with other languages because it reduces the 'gotcha' moment.

  • Scrolling - neither clearly wins

    Assume A, B and C are libraries, and numbers represent function calls. A does boilerplate, I am working on project B — which plugs into A, and B depends on C. In the following stack, C3 panics:

    main() |    C3
    A1     |    C2
    A2     |    C1
    B1     |    B2
    B2     |    B1
    C1     |    A2
    C2     |    A1
    C3     |    main()
    

    When my application panics, I usually look at:

    • the panic message (hopefully C3 gives a good message)
    • what the program flow was through B, because that's where it's most likely to fail — I just wrote some code

    A may not exist if you are not delegating control to a framework, or it may be quite deep. The same goes for C depending on what it is.

    So whether I scroll a little or a lot depends on the context.

On this discussion of what is more "natural" (ignoring how other languages represent it for a moment), especially in reference to

I actually feel the reversed order (_start at top) feels much more natural. I may have been taught weirdly by learning embedded development early, but I've always pictured the stack growing downwards from the top of memory space. Similarly people always talk about "exceptions bubbling up the stack", which makes no sense when the function that produced the exception/panic is sitting at the top of the stack.

Every debugger I've ever used either limits the number of frames it shows or has the stack trace in a separate scrollable window that starts scrolled to the top. Also, how come the command to move down the backtrace in gdb is up.

(gdb) bt
#0  ki_setup ()
#1   in test_runner
#2   in main
(gdb) up
#1   in test_runner
(gdb) up
#2   in main
4 Likes

So let me get this right ... The runtime stack frames cannot be treated as a historical context, and printing the optimized stack frames can suck for determining how the program entered the panic state?

Sounds like what we really need is for panic! to know to provide the nice Python-like stack traces for debug/non-optimized builds. Since hey, we're not optimizing out stack frames, so the stack will preserve call history. For release builds, you just have to suffer with the ugly optimized stack.