How can std::‎thread::JoinHandle::join catch panics?

The code below compiles only if dyn Fn is nwindSafe + RefUnwindSafe, because panic::catch_unwind requires it to be able to catch the panic.

use std::panic;
use std::panic::{UnwindSafe, RefUnwindSafe};

fn launch_closure(f: Box<dyn Fn() +  UnwindSafe + RefUnwindSafe>) {
    let result = panic::catch_unwind(|| {
        f();
    });    
}

However, std::thread::JoinHandle::join function is able to catch the panic even if the thread closure is not UnwindSafe + RefUnwindSafe:

If the child thread panics, Err is returned with the parameter given
to panic!.

How?

I'd like to be able to know if my closure panicked, but UnwindSafe + RefUnwindSafe is too restrictive, I cannot use CondVar for example.

Playground

Yeah, there is this implicit property that Send kind of implies UnwindSafe, and Sync, RefUnwindSafe, precisely for this reason. And then Send is &mut-transitive, but UnwindSafe is not :sweat_smile:

All that to say, {Ref,}UnwindSafe are marker traits that are used as kind of a "linting" trait, we could say, there is nothing strict about what they enforce.

The way to "silence" this lint is by using the AssertUnwindSafe wrapper:

fn catch_unwind_not_unwind_safe<R, F> (f: F)
  -> Result<R, Box<dyn Any + Send>>
where
    F : FnOnce() -> R,
{
    use ::std::panic;
    panic::catch_unwind(panic::AssertUnwindSafe(f))
}

That being said, the lint is there for a reason; when doing this, you should strive to resume unwinding ASAP. That is, the following API is way safer / more sane:

fn peek_panic<R> (
    body: impl FnOnce() -> R,
    on_panic: impl FnOnce(&dyn ::core::any::Any),
) -> R /* or panic */
{
    use ::std::panic;
    match panic::catch_unwind(panic::AssertUnwindSafe(body)) {
        | Ok(ret) => ret,
        | Err(err) => {
            ::unwind_safe::with_state(err)
                .try_eval(|err| on_panic(err)) // <- peek the panic payload info
                .finally(|err| panic::resume_unwind(err))
            ;
            unreachable!();
        },
    }
}
1 Like

I still don't get how join does it, and also why do I have to resume winding as fast as possible?

When you run a thread, the standard library will spawn a new thread and wrap the closure you've provided in std::panic::catch_unwind(). That way any panics which reach the top of the thread's stack are always caught instead of just terminating the thread silently.

From there it's just a case of stashing away the result of the std::panic::catch_unwind() in a shared location. Later, when you call join() it'll cause the caller's thread to block until a value is written to the location, then extract it and giving that result to the caller.

It isn't 100% necessary to resume a panic on the caller's thread. If you wanted to you can choose to handle it differently (e.g. by logging the error and continuing), although often when a thread panics that means something may be broken and you can resume unwinding to "gracefully" crash your application.

1 Like

As Michael said, they basically use the catch_unwind_not_unwind_safe pattern I showcased above.

Good question, I forgot to clarify this point.

The issue with the so called "unwind safety", is that it represents a "soft crash" in the program: in the middle of some operation, the control flow is interrupted and starts unwinding up until the point where a catch_unwind() happens (such as a thread boundary, as you observed).

And this is problematic because we, as humans, can struggle to correctly account for these implicit control flow code paths in our logic.

I'll illustrate that with a dummy library example. First of all, we need an invariant to be upheld, let's pick the classic example of some "cached" index inside an adjacent collection, which verifies to always be a valid index / key of that collection:

mod lib {
    /// Invariants:
    ///   - `self.imax.map_or(true, |i| i < self.vec.len())`
    ///     _i.e._, if `imax` is not `None`, then it can index `self.vec`
    ///     inbounds.
    ///
    ///   - `All the elements in `self.vec` are ≤ the one yielded by
    ///     `self.get_max()`.
    pub
    struct VecWithIndexOfMax<T : Ord> {
        vec: Vec<T>,
        imax: Option<usize>,
    }
    
    impl<T : Ord> VecWithIndexOfMax<T> {
        pub
        fn new ()
          -> VecWithIndexOfMax<T>
        {
            Self {
                vec: vec![],
                imax: None,
            }
        }
        
        /// Can't panic thanks to the invariant :)
        pub
        fn get_max (self: &'_ VecWithIndexOfMax<T>)
          -> Option<&'_ T>
        {
            Some(&self.vec.get(self.imax?).expect("\n    \
                UNREACHABLE – My beautiful lib should not hit this!\n\
            "))
        }
        
        pub
        fn push (self: &'_ mut VecWithIndexOfMax<T>, elem: T)
        {
            if self.get_max().map_or(true, |max| *max < elem) {
                self.imax = Some(self.vec.len());
            }
            self.vec.push(elem)
        }
        
        pub
        fn drop_last (self: &'_ mut VecWithIndexOfMax<T>)
        {
            drop(self.vec.pop());
            self.imax = self.imax.and_then(|idx| idx.checked_sub(1));
        }
    }
}
  • And now look at what happens when we push panic::catch_unwind(AssertUnwindSafe(…)) too far…

    The following code hits the UNREACHABLE expectation and crashes!

    let mut v = lib::VecWithIndexOfMax::new();
    v.push(PanicOnDrop());
    let _ = ::std::panic::catch_unwind(::std::panic::AssertUnwindSafe(|| {
        impl Drop for PanicOnDrop {
            fn drop (self: &'_ mut PanicOnDrop)
            {
                panic!("panic inside `.drop_last()`");
            }
        }
        v.drop_last();
    }));
    eprintln!("It's okay, we survived the panic! :)\n");
    
    eprintln!("Hmm, what was the maximum of that `vec` again?");
    dbg!(v.get_max());
    eprintln!("Nice, all is good and well :)"); // <- NEVER REACHED
    
  • Playground


This is a typical case of the library not being fully properly "unwind safe": inside fn drop_last(), there is an "early-return-unwinding" / panicking code path which interrupts the flow of the function midway, and thus the code that amended the index was never reached:

The idea is that many libs cannot uphold an invariant at all times, precisely because one cannot update a collect and an index simultaneously / "atomically". So one of those two needs to be updated first (FWIW, for a simple case such as an index, a trick to minimize this issue is to "push-then-increment-idx or "decrement-idx-then-pop", precisely to avoid issues such as this one). In the case of my example, the vec was popped first, and only afterwards was the index decremented (with the added "mistake" that an arbitrary drop was executed in between, which is the element introducing the unwind code path as shown in the diagram).

And so, it is very typical, for a library, to temporarily break its own library invariants, just to amend those right afterwards, before returning from the function. And, as I hope I have illustrated here, it is also almost as typical for libraries not to be that mindful of all the possible unwind code paths that would make exiting the function boundary earlier than expected, before the invariants are amended back.

  • In the worst case, if those invariants were safety invariants / if unsafe is used based on those (e.g., imagine using a get_unchecked(…) indexing in my example above: rather than a panic, we'd have had Undefined Behavior :scream:). This causes unsoundness, and it is a very serious bug of that library.

  • But, for many cases, the issue is not that fatal, and making it so all the unwind paths are well accounted for requires a daunting maintainability effort at best, or is straight up an impossible task at worst.

All this to say:

once a code path panics, libraries may end up in a weird, inconsistent and buggy state.

Logic bugs or further panics can then follow.

This is a horrific situation, and that's what the UnwindSafe "lint" trait is for: it makes it so you can't have mutable non-owning references (in the broader sense: &mut unique borrows, as well as &Cell, &RefCell, Rc<RefCell<…>>, etc.) inside the closure given to panic::catch_unwind. Indeed, the situation with library invariants being temporarily broken almost always stem from mutating APIs, so "linting" against those effectively prevents the most frequent footguns from catch_uwnind.

And, as I mentioned, there is always the panic::AssertUnwindSafe(|| … ) escape hatch, for when the "lint" is being overzealous. But I hope you can see that abusing it is not great and can introduce logic bugs into your program, so it has to be used carefully, or the panic payload needs to be resumed ASAP, hence that sentence of mine :wink:

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.