Mutex starvation

Ref: multithreading - Why do Rust mutexes not seem to give the lock to the thread that wanted to lock it last? - Stack Overflow

Here's an actual case of non-fair mutex starvation being a problem. This is from my metaverse client.

Partial starvation, viewed with the Tracy profiler.

Two threads here, "Movement" and "Client", both want the lock on "World". (The main thread doesn't need access to that, and it's steadily rendering at 60 FPS.) The "Movement" thread gets a message on a zero-length channel to wake it up at the beginning of each render cycle. The render thread continues on. The movement thread wakes up, tries to get a lock on World, does a bit of motion updating, releases the World lock, and goes back to waiting for the message that tells it to do another frame.

Meanwhile, the "Client" thread is processing incoming UDP messages from the server. At this time, there are a lot of them, so the Clent thread can go compute-bound. It gets a message from the UDP queue, locks the World, handles the message, unlocks World, and goes back for another message.

When the Client thread releases the World lock, the Movement thread, which is blocked, ought to get in. Sometimes it does, but often it doesn't. You can see this in the Tracy output. Notice the Movement thread with a long delay in "World lock wait", a profiling scope for just waiting for the lock. The lock is held for about 8 frame times (you can see 4 in this screenshot), while UDP message after UDP message is processed, each locking and releasing World. This starves out the Movement thread for a while. Not forever. Eventually the movement thread gets a turn. But the Movement has missed frames, which both looks bad and is logged as an error.

Added a 1ms sleep with the lock unlocked, to force a trip through the CPU dispatcher.

Now, if I add a 1ms delay on each UDP cycle, the starvation goes away. That gives the Movement thread a chance to grab the lock. Now, the movement thread gets in on every cycle.

I suspect the solution to this is to use parking_lot's fair mutex here. And do more of the UDP processing before locking.

This was unexpected. I didn't realize the mutex system was that unfair.

1 Like

Std's mutexes aren't fair mutexes, and while that usually doesn't matter, it can cause starvation in certain circumstances. Using parking_lot's fair mutexes is probably the best option if starvation is an issue, although do be aware that can cause other issues such as lock convoying if you aren't careful.