Cancellation safety of locks in `select!`

In the documentation for tokio’s select macro, it’s mentioned in the “Cancellation Safety” section that locking Mutex and RwLock is not cancellation safe and may lead to losing place in the queue. I assume this applies to something like

tokio::select! {
  mut locked = mutex.lock() => // blah
  // other branches
}

My question is: would I still lose my place in the queue if I avoid dropping the lock future?
In particular something like this

let mutex = tokio::sync::Mutex::new(123);

// create the lock future
let mut mutex_lock = std::pin::pin!(mutex.lock());

tokio::select! {
  mut locked = &mut mutex_lock => {
    // blah
    // once the lock future is used, replace it with a new one (if selecting in a loop)
    std::pin::Pin::set(&mut mutex_lock, mutex.lock());
  },
  // other branches
}

Here, the lock future is not dropped when select picks another branch. Is this enough the ensure cancellation safety (not losing my place in the queue)? Or am I missing some potential pitfalls?

playground link

Yes. This solves the queueing issue, but if any of the other branches contain an .await in the block to the right of the =>, then the code almost certainly is buggy and contains the bug known as FutureLock.

3 Likes

Oh wow I did not imagine something like FutureLock could happen. I was just planning to add something in the other branches that will lock the mutex. You saved me who knows how many hours of deadlock frustration. Thank you so much!!

select! is a dangerous tool :slight_smile:

3 Likes

I wonder if it would be possible to have a less footgun-prone alternative to select!, that perhaps start polling other branches again when the selected branch stops returning ready?

Even if you let the lock() call continue running, once it completes the guard still exists, and if nobody uses the guard it has the same issue.

Then you couldn't move a value from outer scope into the first active branch (imagine: what if other branches needed it too).

I guess a pattern of "have empty handlers, instead moving all async code to the corresponding branch expressions" would work, however ugly it is.

If you move the value the other branches can't use it. That would be prevented by Rust's affine type system.

But I don't think we should be content with status quo here. Rust tries to prevent footguns elsewhere, and sync rust generally does a really good job at that. For async rust we need to come up with thr right abstractions (possibly several different ones for different use cases of select) that avoids footguns.

Deadlocks can happen in sync rust too, but they are way less likely in practise. If you really want to avoid it you can do static lock ordering like Fuchsia's network stack apparently does, in the type system.

I don't think that would be enough in async rust, which means our abstractions aren't the right ones yet. Mind you, I don't know what the right abstractions are. I haven't worked enough in this area to have that expertise, and perhaps I don't have the needed smarts for it. But I think this is an area that needs further exploration.