Is this use of AssertUnwindSafe safe? (Wrapping a future that is `Send + 'static`)

Hello everyone,

I've implemented a function that takes a mpsc::Sender and a future, and catches panics of the future. It then sends any occurred panic or returned Error to the channel. This way it's easy to handle failures centrally, and to just return Result<(), _> from futures, which seems pretty nice for errors that can't be handled (this is mostly useful for tasks started via tokio::spawn that are detached).

To catch panics it is required that the future is UnwindSafe, however requiring that as a trait bound resulted in compiler errors (as UnwindSafe was not implemented for some types, because of shared mutability, I believe).

Therefore, I'm wrapping the future in AssertUnwindSafe, which works.

Now I'm wondering if that is safe to do in this context.

Here is the code of the main function:

pub async fn catch_failure<C, E, F>(channel: ErrorSender<C, E>, context: C, future: F)
where
    C: Send + 'static,
    E: Send + 'static,
    F: Future<Output = Result<(), E>>, // + std::panic::UnwindSafe, // Uncommenting this leads to compiler errors
{
    // Adding [AssertUnwindSafe] doesn't trigger any error, however:
    let reason = match AssertUnwindSafe(future).catch_unwind().await {
        Err(e) => ErrorReason::Panic(e),
        Ok(Err(e)) => ErrorReason::Error(e),
        Ok(Ok(())) => return (), // Everything is ok
    };

    let err = Error { reason, context };

    if let Err(SendError(err)) = channel.send(err).await {
        // Nothing else can be done, besides to panic:
        std::panic::panic_any(err);
    }
}

...and here is a full example with a demonstration: Playground

The future in question is Send + 'static, and I've read that this basically means, the future is unwind safe – however this discussion made me a bit worried, especially this paragraph:

UnwindSafe is an auto trait which will not be implemented for anything that has UnsafeCell and I think &mut like a lot of futures might. AssertUnwindSafe is a bit like impl UnwindSafe for MyType , basically saying to the compiler "I know, this has types that are not unwind safe, but I manually checked the code in this type and nothing can go wrong. It's fine to call catch_unwind on it, I promise." That's why it's kind of dodgy to do that on other peoples types, since we basically (manually) checked shit.

On the other hand, this comment:

Panic safety is often based around "you caught the panic and things were left in bad state you poked at later", but the intention here is that the "thread" of computation (the future) is just split across a few locations and we transfer the panic manually between those locations as an implementation detail. In that sense you can't easily poke at bad state because when your "thread" (future) panics it just keeps panicking.

...indicates (if I understand this correctly), that using AssertUnwindSafe is okay here (because the data isn't touched, and just forwarded back to the user of the function. So it's their job to make sure nothing bad happens).

However, I'm not 100% sure, so feedback would be great.

If someone has any other advice in regard to my code, I'd be interested in hearing that as well.

Is it safe? Well, you have no unsafe block, so the code doesn't have UB.

That said, your code can silence the kind of issue that unwind safety tries to protect you from. I don't know whether that matters to you - most people don't care about unwind safety.

You could add an F: UnwindSafe bound to your function if you don't want to use the AssertUnwindSafe utility.

2 Likes

Thank you, Alice. With "safe" I meant rather "correct", then "not unsafe Rust" (which was an unfortunate choice of words).

Somehow, yesterday I didn't get what was going on and was confused from all the things I've read (it was a bit late). But after a night of sleep and reading your message, I think I get it now.

The following code demonstrates how catch_failure could lead to bugs:

use std::sync::{Arc, Mutex};

#[derive(PartialEq)]
enum State {
    Idle,
    Active,
}

async fn test(state: Arc<Mutex<State>>, fail: bool) -> Result<(), &'static str> {
    if *state.lock().unwrap() != State::Idle {
        panic!("Invalid state!")
    }

    *state.lock().unwrap() = State::Active;

    if fail {
        panic!("Failed")
    }

    *state.lock().unwrap() = State::Idle;

    Ok(())
}

#[tokio::main]
async fn main() {
    // Lets imagine [Mutex] is not [UnwindSafe]:
    let state = Arc::new(Mutex::new(State::Idle));
    let (failure_tx, mut failure_rx) = tokio::sync::mpsc::channel(1024);

    // No problem:
    tokio::spawn(catch_failure(failure_tx.clone(), "Test 1", test(Arc::clone(&state), false))).await.unwrap();

    // Leads to invalid state:
    tokio::spawn(catch_failure(failure_tx.clone(), "Test 2", test(Arc::clone(&state), true))).await.unwrap();

    // In this case, users would have to fix the state themself:
    //*state.lock().unwrap() = State::Idle;

    // Will panic because of an invalid state:
    tokio::spawn(catch_failure(failure_tx.clone(), "Test 3", test(Arc::clone(&state), false))).await.unwrap();

    while let Some(failure) = failure_rx.recv().await {
        println!("'{}' failed with: {:?}", failure.context, failure.reason)
    }
}

Full example

So as you wrote, this is obviously an incorrect use of AssertUnwindSafe.

You could add an F: UnwindSafe bound to your function if you don't want to use the AssertUnwindSafe utility.

I'd love to, but unfortunately that would kill ergonomics, I believe.

your code can silence the kind of issue that unwind safety tries to protect you from. I don't know whether that matters to you - most people don't care about unwind safety.

Can you elaborate on this? Why would anybody not care about this kind of bug?

The only reasons that I could come up with are, that UnwindSafe kill ergonomics, and that UnwindSafe-related bugs probably don't happen very often (when mostly used to handle the failure of a bigger action).

Tokio has similar functionality with tokio::spawn / JoinHandle / JoinError. Is my assumption correct, that Tokio basically does the same as my catch_failure function?

I'm asking because I can't find any warning in the documentation of Tokio that mentions the possibility of an incorrect state, when JoinError is handled.

Currently, I believe, that my use of AssertUnwindSafe can lead to problems, but is okay in the context that it is used. Users just have to keep this in mind, when handling panics.

Would anybody disagree with that?

Why? Most values are UnwindSafe. Adding the bound is the right thing to do unless your abstraction inherently ensures unwind safety (which it apparently doesn't).

Yes, Tokio does the same thing and has a similar issue. The same applies to the std::thread::spawn function that Tokio's design is based on.

2 Likes

If I add the trait bound, I get errors because of the following types that are not UnwindSafe:

  • tokio::sync::OwnedSemaphorePermit (contains UnsafeCell<AtomicUsize>)
  • tokio_tasker::Tasker (contains UnsafeCell<FuturesUnordered<tokio::task::JoinHandle<()>>>)
  • reqwest::Client (contains (dyn CookieStore + 'static))
  • deadpool_bolt::PoolInner / bolt_client::Client (a database connection pool, contains a trait object)

...which are all shared between tasks.

I have no idea about how I'd make this all UnwindSafe (and this is just for two functions and a few hundred lines of code. I'm expecting that there will be much more types that don't implement UndwindSafe).

My thinking was, "If it's good enough for Tokio, it should be more than good enough for me".

Do you have any suggestion about how to overcome this problem?

I could just wrap tokio::spawn, but if that function does the same thing as I do currently (which seems to be the case), there wouldn't be any benefit.

Well, you could wrap each of those values in an AssertUnwindSafe, or your own custom struct that you explicitly impl the UnwindSafe trait for.

You can also submit issues/PRs to those crates to mark the types as unwind safe. For example, the semaphore permit certainly should be unwind safe, because it correctly handles panics.

1 Like

I meant more something like "how to make it unwind safe without blindly wrapping every unwind unsafe value in AssertUnwindSafe".

You can also submit issues/PRs to those crates to mark the types as unwind safe.

Yes, in theory that would be possible. In practice, I have no clue how to make sure something is really unwind safe (so I can't fix it myself). And if there are already 5 types that are not unwind safe now (within a few hundred lines of code), there will be probably 100 when my application is finished (so even reporting it would be a lot of work). Additionally, I guess, it is not realistic to expect every type to be UnwindSafe (but I might be wrong).

My thinking therefore is: "If std::thread and tokio "don't care" about this, why do I need to?".

I'm pretty sure, every programming language that has exceptions and mutable types has the same problem. And so far I've never heard that this lead to bugs (I'm sure it did, just not very often).

The thread module and tokio don't even seem to mention the problem that could happen because of unwind unsafely (like all other programming languages I know).

Basically it's a tradeoff between 'correctness' and 'pragmatism', I think.

the semaphore permit certainly should be unwind safe, because it correctly handles panics.

Do you know how it archives this?

I feel to really solve this problem properly, something similar to database transactions would be needed.

The semaphore permit is unwind safe because the authors of that type made sure to think about what happens if something panics and implement it in a way that doesn't break even if a panic happens.

2 Likes

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.