Why `panic_handler` returns never type `!`?

https://docs.rust-embedded.org/book/start/panicking.html

In programs without standard library, however, the panicking behavior is left undefined. A behavior can be chosen by declaring a #[panic_handler] function. This function must appear exactly once in the dependency graph of a program, and must have the following signature: fn(&PanicInfo) -> ! , where PanicInfo is a struct containing information about the location of the panic.

The implementation of the example panic_handler has loop {} at its end. Wouldn't this hang the thread forever?

Yes, it would. That's the point.

Note that the example code on that page is not an example panic handler; it's an example main(). Either way, the point is that in a bare-metal system, when the program wants to stop, there is no operating system to return to, so looping forever is the only thing that goes nowhere else in the program.

In some cases, you can tell the processor to halt its clock (thus stopping execution and saving energy), but it may or may not actually have that capability, and it might be woken up again by an interrupt, depending on the details of the hardware. So, loop {} is the only fundamental way to provide an end.

Most embedded programs' main()s will have an infinite loop that does whatever the job is, rather than an infinite loop that does nothing. But if there's a panic, the panic handler must do something like looping or commanding a reboot, after it's done the job of reporting the panic as far as it can.

7 Likes

The expectation of panicking is that it's unsafe to continue execution, so we must diverge -- returning the never type means we won't return at all. Unwinding is one way to accomplish that, which might be caught in a safe place. A loop {} is a simplistic sort of halt and catch fire. :slight_smile:

1 Like

Oops that's a mistake. Here is an example panic handler with a loop at the end.

looping forever is the only thing that goes nowhere else in the program

I'm not very clear when this panic_handler is called. Isn't it possible to implement an unwinding panic handler like the std does, and hopefully the stack unwinding finally hits a catch_unwind so that program can recover from the panic? If the panic handler instantly enters an infinite loop, then there would be no way to recover from a panic.

There is no guarantee that panics can be caught. You shouldn't rely on panic catching for error handling. Trivially, building with a panic=abort profile makes all panics fatal.

It certainly is. But that still doesn't continue execution at the point where the panicking function is called. So the type is always ! anyway.

1 Like

This is a popular claim, but I think it needs a caveat. A specific application, targeting a platform which supports unwinding, can always choose “We have a need for catch_unwind, so we must build with panic=unwind.” A library can even do that too and document its requirement. The thing that shouldn't be done is requiring unwinds when you're writing a general-purpose library — but not all Rust code is general-purpose libraries.

4 Likes

I feel like the caveat needs a caveat too:

Panics in drop implementations while panicking will always trigger an abort, so not all cases are covered by this. Playground

On spirit though I agree, as someone writing a long running web server process, we need to recover from panics happening while handling a request.

EDIT to add:

In some cases, you can tell the processor to halt its clock (thus stopping execution and saving energy), but it may or may not actually have that capability, and it might be woken up again by an interrupt, depending on the details of the hardware.

In security context, hardware may also use a kill function that will cause it to self-destruct (for instance, to prevent future attempts at getting secrets from attackers)

2 Likes

Unfortunately, most code that isn't written in that spirit ends up being a complete mess or at least contributing to it. When transitive dependency number 628937 has yet another caveat regarding:

  • a library that needs to be installed by the user
  • an environment variable to set
  • a specific version of the toolchain to build with

then the last thing I want to see is "oh and by the way, you MUST NOT build with a panic=abort profile". How in the world am I then supposed to remember and formalize all those details, and make reproducible builds (or build successfully at all)?

I also feel that the "but we need catch_unwind" argument is not legitimate at all. It invariably signals that someone is trying to write Java in Rust. Panics are not the primary error handling mechanism in this language, and they should be regarded as a best-effort reporting/debugging tool for non-recoverable errors.

It's nice when the proverbial web server doesn't crash on panics because the web framework stuffs the entire event loop in a giant catch_unwind(), but:

  1. the programmer writing the endpoint handlers should have not written panicking code anyway; and
  2. s/he might as well have written directly-aborting code, so basically all bets are off anyway when someone is desperately trying to avoid termination by "catching" stuff. It's simply not possible to do reliably, and claiming that ability is nothing but an illusion.
2 Likes

Bugs exist.

As a programmer you have various ways of handling this reality but you cannot escape it.

On a long living application with many ancillary features, that manipulates user data, aborting on the first bug, even if it is in a subtask, losing user-data, creating unavailability, can reasonably be seen as not the correct answer. It doesn't mean that it has to be a disorganised catch unwind fest. History pushes the OS process as the only practical isolation level, but that's mostly needed due to unsafe-by-default languages, because memory errors are global to the process. In Rust by contrast it is very practical to have a finer isolation grain.

To your points, yes sometimes dependencies can abuse panics, although this is less prevalent than exceptions in C++, and yes sometimes dependencies could write their code to directly abort (although I am yet to encounter that case, the existence of panic and the ability to implement it as abort seem to be enough for people who would directly abort) or to cause an abort due to the kind of conditions described above (e.g. panic during a panicking drop). It doesn't mean that recovering from panics is impractical, as long as what constitutes the "tasks" is correctly defined.

1 Like

I'm sorry, I was unclear by trying to be too general. Let me specify some concrete categories:

  • Application code, that is part of Cargo binary targets, can reasonably rely on being compiled with the configuration its Cargo.toml specifies.
  • So too can unpublished libraries that are part of the same workspace as the above.

The thing I am objecting to is “no guarantee” as a blanket claim, which makes it sound as if unwind catching were flaky. It's not flaky — it works in well-defined cases and doesn't work in other ones. Yes, it won't catch an abort, but it will catch all unwinds, and unwinds happen in predictable conditions.

I entirely agree that, if you're publishing a library, you should not make its operation depend on catch_unwind if at all possible, because then you don't know which one of those cases you're in. And I also agree that panics are not appropriate for reporting foreseen error conditions.

Bugs happen. The option is between crashing just a single request handler and instantly recovering using catch_unwind or crashing the entire server process and dropping all connections and having to wait for the entire process to restart. In C/C++ the latter option is generally better because there may be latent corruption causing knock on effects in the future, but in rust request handlers are much more isolated, so crashing a single request handler and keeping the rest of the process alive is a much better option. My understanding is that this is exactly what Erlang does to allow it to enable 9 nines uptime.

That a request handler may cause the entire process to crash in some cases doesn't mean that always crashing the entire process is better than only sometimes.

And that's not what I claimed.

My problem is with code that claims that it "needs" catching, especially for correctness or soundness. Since panics can't always be caught, relying on them for anything but a best-effort recovery/reporting attempt is an error.

I misunderstood your previous post. I only know of one project which actually needs catching of panics: rust-analyzer. For read requests at various points it checks if the request has been cancelled because for example the source code changed. If this is the case it throws a panic with a special payload and in the main loop it catches this panic and turns it into a canceled response to the editor. This is entirely fine as read requests don't modify any state and as such everything will be left in a consistent state. Using panics instead of a MaybeCanceled return type avoids tainting everything with this return type. It also enables canceling within Option::map() and iterators. I can understand if you don't like using panics this way though.

1 Like