Using Rayon in code locked by a Mutex

I have deadlocks when using Rayon, because my program ends up in a situation where all Rayon threads wait for a mutex to be released, but the code holding the mutex is waiting for a rayon thread to be available.

I think the situation boils down to something like this:

fn mutex_and_par() {
   some_mutex.lock().par_iter().collect();
}

collection.par_iter().for_each(mutex_and_par);

What’s the best way to resolve this?

The obvious solution would be not to use Rayon while holding a Mutex, but enforcing that is hard in a large program, and I’d still like my mutex-locked lazy initialization to be parallelized.

1 Like

I’m guessing this breaks down something like:

  • Thread 1 starts processing one of the for_each(mutex_and_par) calls.
    • grabs the lock
    • splits up the par_iter() into a few jobs
    • starts the collect() on one of those jobs
  • Another thread steals one of those jobs to help out
  • Thread 1 finishes its current job
    • looks for the other jobs and sees they’re stolen elsewhere
    • goes to steal more work itself while it’s waiting…
    • finds another of the for_each(mutex_and_par) calls
    • tries to grab the lock (again) … DEADLOCK

I think it’s akin to the discipline needed to avoid lock inversion or recursively locking a given mutex, except Rayon’s work-stealing makes that implicit. And yes, that’s really challenging. I’m not sure how we could really alleviate this either – ideas welcome!

One way you might be able to avoid rayon under your mutexes is to break up the read-modify-write components of your lazy initialization. That is, first lock and read if init is needed at all; if so, release the lock and do the initialization separately; then grab the lock again and write the initialized result. This assumes that your initialization is otherwise idempotent, because you may end up doing it multiple times – like in this recursive situation!

I filed a bug for us to track this further – feel free to continue discussing either here or there.

2 Likes

Interesting problems!

If I interpret it correctly, we accidentally run in to a depth-first scheduling, where the inner par_iter ‘trap’ threads, preventing the outer par_iter from generating new splitter workloads.

Looking at the code, and how cuviper explains the deadlock happening, I see two general strategies for a way out, although I am completely unsure how feasible they are.


First one would be to make the Rayon scheduler somehow aware of the “depth” of the par_iter calls, and have it prioritize the shallowest/outermost par_iter first.
Then, all possible jobs will always have been queued before the locking happens.

However, I fear this would require some form of spooky action at a distance ((expensively) inspecting the stack? Some secret Rayon local storage?).
This scheduling change also wouldn’t work universally, what happens in a different program where the lock-trapping happens in the outer par_iter?


The next strategy would be to give each par_iter its own queue, plus at least one single dedicated thread. So any callsite is guaranteed to have at least one thread available.
In this instance, that would trap almost all threads on the inner lock, but the one reserved outer-thread would keep chugging along, releasing all the locks, one at a time, eventually.
Deadlock avoided at the cost of parallelism.

Another downside is that that thread will be wasted when that queue is empty, but others are still full.


Now that I’ve written them out, I dislike both my suggestions, because they seem to add a lot of complexity to the current, elegantly simple, Rayon model, and they incur costs even when they aren’t needed.

I hope the rough ideas can spark better ideas :smile: