Yes, the problem is here
graph
.par_iter_mut()
.zip(&mut path[..])
.zip(&column_k[..])
.enumerate()
.for_each(|(id, ((rows, rows_path), ik))| {
if id != k {
rows.par_iter_mut()
.zip(&mut rows_path[..])
.zip(&row_k[..])
.enumerate()
.for_each(|(id, ((ij, ij_path), kj))| {
if id != k {
floyd_serial(ij, *ik, *kj, ij_path, coord);
}
});
}
});
If i remove the inner par_iter
, all works fine, but i lose a lot of performance. For example with 8192: using this code i have 1.05seg. Removing the inner par_
and using LTO="fat"
i have 1.6seg.
I don't really understand why this.