Panic: Unwind vs. Abort


#1

Abort sounds great, but I’m having trouble understanding when I would want to use this.

The linked blog post says:

Why would you want to do this? Remember that panics are for unexpected problems, and for many applications, aborting is a reasonable choice. With an abort, less code gets generated, meaning that binary sizes are a bit smaller, and compilation time is ever-so-slightly faster.

I understand that with abort the compiler just doesn’t generate the stack unwinding code that it was making before, but I don’t understand what that is meant to be used for.

Is abort merely a shortcut that generates less code? What situations wouldn’t I want to unwind in? If I have some code that is pretty much guaranteed not to panic except in the worst possible cases, should I switch it to panic=abort?


#2

Unwinding panics enable an application thread to shut down in a relatively clean way. All allocated system resources are reclaimed, all application objects are properly dropped, and so on. In addition, panics stop at the boundary of the offending thread, rather than killing the whole application process. All of this means that if all objects have sensible destructors, application recovery from a panic is possible, although difficult.

If your application is designed for it, you can detect a thread panic and restart the offending thread, hoping that operation will resume properly. This Erlang-like approach to fault tolerance can be relevant in situations where shutting down the application is unacceptable, such as in critical systems where the life and death of people depends on the fact that the application continues running.

With aborts, there is no such possibility of application recovery. As soon as some piece of code aborts, the application process is instantly killed, which means that achieving fault tolerance requires much more elaborate multi-process designs. In addition, because resource destructors are not run, the whole system can be left in an inconsistent state, which means that restarting the application may be highly non-trivial.

To summarize, you should only enable panic-on-abort in situations where you really do not care about your application crashing instantly AND potentially also trashing any hardware/OS state that it was manipulating at crash time along the way.


#3

Thanks! Does abort leak memory?


#4

It cannot leak RAM, because the OS (not the application) takes care of liberating all remaining heap-allocated memory on process shutdown.


#5

So if an application only uses memory and doesn’t use any files or
anything, is abort safe to use?


#6

Yes. Basically, you should be careful with abort if your application modifies the state of the system it’s running on in some way.

Correct use of Drop + unwind allows you to make sure that you leave the system in a consistent state in such a case, whereas abort won’t let you clean up after yourself.


#7

Any application should be able to recover from a crash. Even if the application is perfect and never crashes, the operating system might, or the power could go out. So compiling with aborts really shouldn’t be a problem. (If you are concerned about the consistency of information stored in files, use SQLite.)

However, some applications are intended to run for a long time, and in some cases, it might be beneficial to catch errors from panicking threads and resume operation as if nothing bad had happened. I expect that all kinds of servers are in this category.

We’ll have to see how difficult it is to write unwind-safe code in Rust in practice. Back when people started with C++ and exceptions, things looked much easier than they turned out to be.


#8

Note that in Rust memory allocation failure aborts regardless of this setting:

fn main() {
    vec![0;1<<40];
}

fatal runtime error: out of memory

so you can’t build life-and-death critical systems or unkillable servers :frowning:


#9

Unkillable servers? Sounds like something out of a sci-fi movie. :open_mouth:


#10

@kornel: Many critical systems allocate all of their memory (or at least, whatever is necessary to the mission-critical part) from static buffers for exactly this reason. If a key part of your application relies on dynamic memory allocation, there is little you can do in the event of malloc failure.


#11

@fweimer: Actually, I would love to see hardware, OS and API designs where every single operation is transactional and guaranteed to leave the system in a consistent state.

Unfortunately, we are not there yet, which is why we have UPSes to enable clean shutdown on power failure, and a growing amount of fault-tolerance features in OSes (driver isolation w/ watchdog & auto-restart, journaling file systems…)


#12

That’s an approach for flight control systems, but for lots of other software, like web servers, there’s a lot of middle ground between provably never failing and being unreliable by design.

I process images on virtual servers, and very often run into images that require buffers larger than free RAM on the vm (the processing pipeline is complicated, so it’s not always directly related to image dimensions).

Currently these parts use a C library that handles OOM fine. Even on Linux. My servers run out of memory thousands of times per day, and it’s all stable. The too-large request are gracefully denied, and all other requests run uninterrupted.

If I port this to Rust my server will kill itself thousands of times per day, and abruptly terminate not only requests it can’t serve, but also all other requests that were in flight at the time.

But to stay on topic: panic=abort is a bad idea for web servers. A web server could use one thread per request and isolate requests, if it wasn’t for poor handling of OOM in Rust.