Hello ! I have a problem and I need to ask for help.

I have the following code:

```
fn floyd_parallel(graph: &mut Matrix) {
(0..BLOCK_SIZE).for_each(|k| unsafe {
let row_k = *graph.get_unchecked(k);
let column_k = get_column(&graph, k);
graph.par_iter_mut().enumerate().for_each(|(i, rows)| {
let ik = *column_k.get_unchecked(i);
rows.par_iter_mut()
.zip(row_k.par_iter())
.for_each(|(ij, kj)| {
let sum = ik + *kj;
if sum < *ij {
*ij = sum;
}
});
});
});
```

My problem is that if I remove the last `par_iter_mut`

, the program runs faster. But the incredible thing is that if I replace all the `par_iter_mut`

with `iter_mut`

, the code is much faster (almost double).

Why does this happen ? What am I missing ? This logic using Openmp in C works perfect, but with rayon I'm having problems.

I cant paralelize k iteration because each iteration depends on the next one.

Thank you very much to everyone.