Experts catching panics wrong?

I though a new post would be best for this.

quinedot wrote:

"There are also a lot of tricky caveats around catching panics that even experts get wrong."

What exactly are you referring to? I think I remember there have been errors found in the Rust standard library ( in unsafe code where an untimely panic could cause UB ), but that doesn't make catching panics difficult.

( Section 3.1 "Panic Safety" here: https://taesoo.kim/pubs/2021/bae:rudra.pdf )

There're at lease two reasons why you should not assume that panics can be caught reliably.

  1. Binaries can be compiled with panic = "abort" mode, which immediately shutdown the process on panic without unwinding.

  2. Double panic, means panic during another panic unwinding, is always abort(shutdown) the process.

In conclusion, if you're writing library you should never assume panics can be caught as you have no control over the 1. case. On binaries you have control over the 1. case, but more place you use panic as a recoverable error the possibilities they overlap increases, which will trigger the 2. case. If some libraries uses panic as a recoverable error things get worse as even on binaries you don't have control over the 2. case.

8 Likes

What you say is true, but not really related to the post subject, which is "experts catching panics wrong". I am questioning whether this claim is accurate, and what quinedot meant by it.

What I described are the tricky cases where even experts can easily get it wrong easily.

2 Likes

One subtle hazard I've heard of is that catch_unwind is not sufficient to block all unwinding; if the panic payload (see panic_any()) has a Drop implementation that itself panics, and the catcher drops that value (as it usually would), then there's now a panic outside the catch_unwind, so if the goal of the catch was e.g. FFI safety, it hasn't achieved it.

struct Bomb;

impl Drop for Bomb {
    fn drop(&mut self) {
        panic!("Surprise!");
    }
}

fn main() {
    let _ = std::panic::catch_unwind(move || {
        std::panic::panic_any(Bomb);
    });
    println!("This won't be printed because the drop panic is outside the catch_unwind");
}

This is troublesome enough to consider changing the language to avoid it:

https://github.com/rust-lang/rust/issues/86027

https://github.com/rust-lang/lang-team/issues/97

8 Likes

This is certainly a hazard, but I think in practice it is likely to be rare. Typical drop implementations are in my experience not liable to panic. So I think it's mostly a theoretical ( albeit real ) concern. I am not aware of any cases where "experts" have got it wrong.

[ To justify the above, a panic typically arises from the discovery of some bad condition - drop implementations are generally housing-keeping functions that do not go looking for errors, they free up resources. If someone has an actual example of an expert encountering the issue in a real-world program, I would be somewhat surprised. The first link I would describe as a contrived example. the "Bomb" struct panics on drop. ]

If you're writing a library, and you're relying on catch_unwind for soundness (for example, any FFI wrapper that accepts a Rust function as a callback called from C must, since panics must not cross FFI boundaries), then your library is unsound if you don't handle this because your library being sound means that it can't cause UB no matter what any of the code interacting with it does.

(And, a more squishy-philosophy-of-software-engineering matter: programs get extremely complex and it really matters that we arrange that things are sound in 100% of cases rather than 99.9% of cases, because when you multiply together lots of those almost-perfect successes you get an uncomfortable failure rate. (0.999).powi(100) ≈ 0.90.)

And, if you look at the last few comments on "Never allow unwinding from Drop impls", they're from people who are concerned about the proposed change because they want things that panic on drop.

10 Likes

Yes, I see some kind of a debate there about the usefulness of panic on drop. I don't want to minimise the issues, but on the other hand I do think the problems could be described as "somewhat theoretical" ( but nevertheless deserving of attention ). I cannot imagine my code panicking in drop ( I only ever implemented drop once, and the drop in question doesn't even occur during a caught panic ).

Of course, others may have other experiences, I would be interested to hear about them.

As an example of experts catching panics wrong, Tokio has some code where it catches panics, but I'm pretty sure we don't catch panics from the destructor of the panic payload of a previously-caught panic.

It wouldn't cause memory unsafety if it happened, but you would probably lose one of your worker threads.

2 Likes

Are you saying there was a bug in the tokio code? What was the mistake?

If a Tokio task panics with a payload that panics again when dropped, then you will lose the worker thread it happened, is my guess based on how I recall the code was written.

4 Likes

Ah ok. I wouldn't call that a bug, it seems to me that a panic during an unwind is always going to be problematic and should be avoided ( if you want the panic to be caught ). Maybe it's possibly to deal with the situation, but I wouldn't want to depend on that.

But if it was in a situation where it caused UB, then that would certainly be a bug. Safe code causing memory errors/other UB is supposed to be impossible in safe Rust; any code that doesn't uphold that guarantee is considered unsound (i.e., buggy).

1 Like

From the thread I also linked to, there's the issues @kpreid pointed out (bugs in rustc/stdlib -- experts getting it wrong), followed by some conversations around the RUDRA paper.

To quote the Lang team lead in #97,

Code using catch_unwind is not typically prepared to handle an object that panics in its Drop impl. Even the standard library has had various bugs in this regard, and if the standard library doesn't consistently get it right, we can hardly expect others to do so.

That quote is in the context of removing that particular footgun, but there is no consensus yet and the footgun still exists.

Those are the citations I have on hand, but I've seen other examples via TWIR, changesets, blog posts, and the like.

I want to highlight this as well. Protecting an FFI boundary from unwind UB is one of the niche circumstances where catching panics is not only applicable, but necessary. And because it's a soundness issue, "eh, esoteric circumstances" doesn't cut it in a language like Rust, where safety is a core tenant.

It can be a struggle to recognize when coming from C/C++ say, where a lot of UB is "turned off" with compiler flags or just culturally ignored.

I do recognize that if there's not a risk of UB, it may not be worth one's effort, and slapping a good-enough sticker on it is a viable approach. However, catch_unwind does not protect you from unexpectedly broken logical invariants (as panics break the library API), so trying to recover can still be risky (prone to bugs).

9 Likes

In addition to the panic-on-drop stuff already discussed, writing unsafe code which can correctly handle the possibility of a panic is hard.

Any time you slightly bend the rules (e.g. I'm reallocating a buffer and temporarily have a dangling pointer which my Drop impl might free) you need to worry about how an untimely panic might let your users observe this transient state. This gets even harder when you realise that any functions provided by the user (e.g. via methods on a trait) may panic.

There used to be a problem with Vec's Drain iterator where forgetting the Drain would mean we never shift the items at the end down over the items that were already removed, letting you read uninitialised memory.

In practice you see the Pre-Pooping Your Pants pattern being used a lot, where you put the original object in a safe-but-neutered state and move the unsafe logic into some drop guard which will restore correctness when it is dropped.

2 Likes

Yes, it is necessary to make sure any persistent data structures (that is data structures that still exist after the unwind) are in a consistent state (rolled back) after an error, but I don't see that the use of catch_unwind rather than Result makes any difference there, it's still necessary either way. Maybe I am wrong about this though.

Can you please tell me how I can do that with "cargo test"
I don't want to set it in my Cargo.toml

[profile.release]
panic = 'abort'

I would like to use my "cargo test" command hopefully with flags to check my test function that catches a panic to prove it fails when panics abort.
`
Is there a way around the

warning: `panic` setting is ignored for `test` profile

issue?
Can test profiles be run with panic=abort?

Baby steps toward expert.

In Android, we compile tests with panic = abort, but we use our own build system rather than cargo. I don't know if cargo has an option for it.

I guess I just needed to ask. Usually as soon as I ask someone a question the answer comes to me.

RUSTFLAGS='-C panic=abort -Zpanic_abort_tests' cargo +nightly test

Will run tests with panic=about.

Baby steps toward being an expert....

1 Like

Now I need to figure out how to check if panic=abort is set for the compile. Maybe a cfg flag test of some sort is available. I don't want to include the catch code if catching will not work.