Rayon prevent wait

Hello ! I want to use the directive "nowait" of openmp in Rayon, for example when using par_iter_mut. Is it possible ?

Thanks.

Not directly, rayon is not a crazy macro. But you can move the work you want to happen in parallel into the closure that runs in parallel.

Thank you for reply !

But assuming I have 2 par_iter_mut, if I combine them and add more work to the processes, I am in the same problem: there are still idle processes. That's why I want to simulate the nowait of openmp to not have idle processes.

You need to show example code. I don't really understand exactly what the situation is.

Yes, sorry.

I have this two parallel sections.

    graph
        .par_iter_mut()
        .map(|row| row.get_unchecked_mut(k))
        .zip(column_k.par_iter_mut())
        .enumerate()
        .filter(|(id, _)| *id != k)
        .for_each(|(_, (col, item_k))| {
            floyd_serial(col, *col, kk);
            *item_k = *col;
        });

    graph
        .get_unchecked_mut(k)
        .par_iter_mut()
        .zip(row_k.par_iter_mut())
        .enumerate()
        .filter(|(id, _)| *id != k)
        .for_each(|(_, (row, item_k))| {
            floyd_serial(row, kk, *row);
            *item_k = *row;
        });

In the first par_iter_mut, surely I do not need all the processes, so there will be some idle. So, if i could insert a nowait like openmp, I would solve the problem.

Use join. This function lets you define two pieces of code that will run in parallel.

join(
    || {
        graph
            .par_iter_mut()
            .map(|row| row.get_unchecked_mut(k))
            .zip(column_k.par_iter_mut())
            .enumerate()
            .filter(|(id, _)| *id != k)
            .for_each(|(_, (col, item_k))| {
                floyd_serial(col, *col, kk);
                *item_k = *col;
            });
    },
    || {
        graph
            .get_unchecked_mut(k)
            .par_iter_mut()
            .zip(row_k.par_iter_mut())
            .enumerate()
            .filter(|(id, _)| *id != k)
            .for_each(|(_, (row, item_k))| {
                floyd_serial(row, kk, *row);
                *item_k = *row;
            });
    }
);

Wow thanks !!

I have the following error:

error[E0524]: two closures require unique access to graph at the same time
--> src/blocked.rs:141:13
|
128 | rayon::join(
| ----------- first borrow later used by call
129 | || {
| -- first closure is constructed here
130 | graph
| ----- first borrow occurs due to use of graph in closure
...
141 | || {
| ^^ second closure is constructed here
142 | graph
| ----- second borrow occurs due to use of graph in closure

Mutable access in Rust implies exclusive access, and anything else is undefined behavior. I see that you're already using unsafe to violate that assumption in some places.

rayon::spawn is used to fire-and-forget in Rayon. Or rayon::scope if you need to run work asynchronously, but then wait for it to finish.

If you have mutable slice to distribute between tasks, you will need split_at_mut() (or chunks_mut()) to get two (or more) mutable non-overlapping sub-slices.

Hi @kornel, yes but i need the column k and row k of the matrix. How can i get both slices to process in parallel ?

You can't do that easily here, because you are mutating graph in both arms of join. It is not safe to do so because you might be trying to read from/write to the same memory address from two threads simultaneously.

If both of these operations take a decent amount of time you might as well just run them one after the other.

Are they even non-overlapping at all in your case? I'm not sure if I follow what your code is doing. What if you write to ith element when processing jth row, while another thread processes j element in i column?

You could use AtomicU32 elements in your graph, and iter() instead of iter_mut().

If you're sure threads never touch the same memory, then use unsafe to cast mut protections away (which is actually unsafe, sadly).

For your single graph it's likely too much work, but usually when complex mutli-threaded access is needed, it's possible to build a safe abstraction that enables it. rav1e did that to divide images into tiles:

https://blog.rom1v.com/2019/04/implementing-tile-encoding-in-rav1e/