Hi everyone, I am a beginner in Rust and looking for a bit of help regarding my code. I am trying to write high-performance code for a specific linear algebra operation (sparse-dense matrix multiplication for a custom sparse layout), but running into some issues regarding parallelization.
I have this rather straightforward function
use nalgebra::{DMatrix, DVector};
type Matrix = DMatrix<f32>;
type Vector = DVector<f32>;
fn diamul(ks: &[isize], xs: &[Vector], y: &Matrix, z: &mut Matrix) {
xs.iter()
.zip(ks.iter())
.for_each(|(x, &k)| _single_mul(k, x, y, z));
}
fn _single_mul(k: isize, x: &Vector, y: &Matrix, z: &mut Matrix) {
// roughly implementing z += x[:, None] * y[:, :] in NumPy-like form
}
which works as intended. For performance, I would like to convert the iteration into
xs.par_iter()
.zip(ks.par_iter())
using rayon
. However, this raises the following error:
error[E0596]: cannot borrow `*z` as mutable, as it is a captured variable in a `Fn` closure
I understand this is due to multiple threads wanting do act upon z
which is mutable, and that could create conflicts. However, here, I just want to accumulate the result of independent computations into z
, which should be parallelizable without problem. Is there a way I could achieve this?
I am providing the full code below in case that's useful. Thank you!
use criterion::{black_box, criterion_group, criterion_main, Criterion};
use rand::prelude::*;
use rayon::prelude::*;
use nalgebra::{DMatrix, DVector};
const N: usize = 1024;
const D: usize = 3;
type Matrix = DMatrix<f32>;
type Vector = DVector<f32>;
fn _get_slice(k: isize) -> (usize, usize, usize) {
if k >= 0 {
let idx_in = k as usize;
let idx_out = 0;
let size = N - (k as usize);
(idx_in, idx_out, size)
} else {
let idx_in = 0;
let idx_out = -k as usize;
let size = N - (-k as usize);
(idx_in, idx_out, size)
}
}
fn _single_mul(k: isize, x: &Vector, y: &Matrix, z: &mut Matrix) {
let (idx_in, idx_out, size) = _get_slice(k);
let _x = x.rows(idx_in, size);
let _y = y.columns(idx_in, size);
let mut _z = z.columns_mut(idx_out, size);
_z.column_iter_mut()
.zip(_y.column_iter())
.zip(_x.iter())
.for_each(|((mut z_col, y_col), x_val)| {
z_col.axpy(*x_val, &y_col, 1.0);
});
}
fn diamul(ks: &[isize], xs: &[Vector], y: &Matrix, z: &mut Matrix) {
xs.par_iter()
.zip(ks.par_iter())
.for_each(|(x, &k)| _single_mul(k, x, y, z));
}
fn benchmark_diamul(c: &mut Criterion) {
// Generate random diagonals and offsets, and other matrix
let mut rng = rand::thread_rng();
let ks = vec![-1, 0, 1];
let xs = (0..D)
.map(|_| Vector::from_fn(N, |_, _| rng.gen::<f32>()))
.collect::<Vec<_>>();
let y = Matrix::from_fn(N, N, |_, _| rng.gen::<f32>());
let mut z = Matrix::zeros(N, N);
c.bench_function("diamul", |b| {
b.iter(|| {
diamul(black_box(&ks), black_box(&xs), black_box(&y), black_box(&mut z));
})
});
}
criterion_group!(benches, benchmark_diamul);
criterion_main!(benches);