I'm a little confused about the concept of mutex poisoning, and didn't quite understand even after reading the Rustonomicon.
According to the doc, if I understand right, a mutex could be poisoning if another thead panic when holding the lock, but if there's a panic, the whole program terminates, why should we care?
And what's more, in practice, what should I do to deal with the result returned from lock()? Currently, I just unwrap() it, but I followed a code style which asked to use unwrap as little as possible, so how could I recover the code if lock failed?
And I found that the crate parking_lot do not fail when locking, what did they do?
The poisoning is a signal that informs you that a previous call to lock panicked while holding the lock, and that your mutex contents may be broken (due to your own code not properly handling panics). The error that it returns also contains the mutex guard, so you can ignore the poisoning by accessing the poison error.
In parking_lot, they simply unlock it normally when you panic while holding it.
That's not true, only the thread in which the panic occurs is terminated.
The Err variant of the Result returned by lock() contains a PoisonError<MutexGuard> which allows you to get a MutexGuard anyway. Knowing this you can fix whatever you could have to fix in case another thread panicked while holding the lock. If you don't care to handle that case then you can just do .unwrap_or_else(|e| e.into_inner()). Note that .unwrap() is not that bad of a choice either: if another thread panicked then you probably have something wrong going on, so propagating the panic can be a wise choice.
Generally, all/most Rust code style guidelines will request that. However, poisoning is an exception — it doesn't mean that you aren't handling something that you did wrong. It means that you are propagating a panic someone else initiated. Thus, in general, it's fine to unwrap the result of lock().
I'm even more confused, you mean that even if a panic happens, the local variables' destructor(or drop function) still get called, so the lock guard will release the lock anyway, right? If so, the lock is released perfectly, why called it poisoned?
The idea is that &mut self functions in Rust will usually assume they run to completion in order to restore invariants. E.g. assume an accounting struct that supports transferring money from one account to another with a &mut self method. If that would panic in the middle of such a transfer, it can mean that the whole accounting structure suddenly contains more money overall.
To help prevent such settings, there’s the auto-traits UnwindSafe and RefUnwindSafe. A mutable reference in Rust is not UnwindSafe, which is supposed to make it harder to
capture something mutably in a closure, and
then execute that closure with catch_unwind
which would result in the broken state of such an accounting structure to become visible to other code.
The poisoning of Mutexes serves the same purpose. If you have a global Mutex<AccountingStructure> and a money transfer panics (idk perhaps due to integer overflow, or there were some custom callbacks involved that could panic…), then the Mutex gets into a poisoned state to indicate to any future user that
if the type contained in the mutex can be left in “broken” states, then it might be in such a broken state now, because some operation accessing the mutex’s contents did panic
hence when writing code accessing this Mutex you an either conservatively always unwrap when locking, so that any poisoning would result in propagating the panics, or, if you know more about the contained data type and how it’s used throughout the rest of the program and thus are certain that there are no invariants that can be broken, you can ignore the poisoning and decide to access the value anyways
In C++ panics are called exceptions, and they have basically the same unwinding behavior as in Rust. This is one reason why (C++)std::lock_guard exists, and is better than manually locking and unlocking a std::mutex: if the code holding a lock_guard returns early, throws an exception, or jumps past the unlock point, its destructor will still be called, releasing the lock. But in code that uses plain lock/unlock it is easy for an overlooked exception to accidentally unwind past the unlock, leaving the mutex locked .
C++ std::lock_guard does not distinguish between "normal" cleanup (return/continue/goto) and exception-unwinding, so when an exception is thrown while the lock is held the mutex will just be unlocked . This is fine and is also how parking_lot's Mutex works. But if you're using a mutex to preserve some "library level" invariant, and an unexpected panic happens while that invariant is temporarily broken, it might be useful to signal that. So Rust's Mutexdoes distinguish between normal control flow and panic-unwinding, and sets the poison flag only in the second case.
and possibly causing UB, if the exception terminates the thread ↩︎
I'm pretty sure, anyway, based on reading docs; I haven't tested this ↩︎
Please correct me if I'm wrong. So a "poisoned lock" illustrates that there are some unrecoverable failure happens to the operations it protected, not illustrates that the lock itself is functionally broken, so even if a lock is poisoned, it is still workable to other threads (if you choose to ignore the poison state)?
Yeah, the lock still works fine. If the lock wasn’t unlocked, you would just keep blocking; getting a PoisonError means you successfully locked the lock just fine and can access it safely (in terms of memory safety), just the value it contains might be logically broken/inconsistent in some ways.
You can ignore the poisoning by turning the PoisonError<MutexGuard<'_, T>> into a MutexGuard<'_, T> with PoisonError::into_inner and then access the value in the mutex just fine. Even if inconsistent states are possible in your use-case you can also decide to still handle the PoisonError by calling some sort of recovery/cleanup method that brings the protected value back into a non-broken state.
Well, it depends on what exactly “properly taking into account that something could panic” means. AFAIK it’s mostly only important to make sure that panics can’t result in memory unsafety (something you’ll only need to worry about when you use unsafe code in the first place). Restoring invariants might even be impossible unless you execute more code that could ultimately result in double-panic. So in light of ensuring proper cleanup, it’s sometimes better to leave things in an inconsistent state rather than risking an abort.