I am trying to parallelize an implementation of Gaussian elimination. I am aware that crates such as rayon may make this easier, but my goal is to use a mpsc::channel based implementation, so I chose to use a threadpool.
Here is an overview of the code base. Some of the features are redundant, but this code was ported from C, so I tried to maintain the structure.
The data structure passed around contains the size of the matrix (redundant), the matrix, four verification vectors (used to determine correctness), and the number of threads:
pub struct Data {
pub nsize: usize,
pub matrix: Vec<Vec<f64>>,
pub b: Vec<f64>,
pub c: Vec<f64>,
pub v: Vec<f64>,
pub swap: Vec<u64>,
pub num_threads: usize,
}
The sequential algorithm:
pub fn compute_gauss(data: &mut Data) {
for i in 0..data.nsize {
pivot(data, i);
let mut pivot_val;
for j in i + 1..data.nsize {
pivot_val = data.matrix[j][i];
data.matrix[j][i] = 0.0;
for k in i + 1..data.nsize {
data.matrix[j][k] -= pivot_val * data.matrix[i][k];
}
data.b[j] -= pivot_val * data.b[i];
}
}
}
the pivot(matrix,index)
function mutates the matrix in data
My current implementation of parallel gauss:
pub fn compute_gauss_p(data: Data) -> Data {
let num_threads = data.num_threads;
let pool = ThreadPool::new(data.num_threads);
let nsize = data.nsize;
let sdata = Arc::new(data);
for i in 0..nsize {
pivot(&mut *sdata, i); //compile error, as Arc does not implement DerefMut
for n in 0..num_threads {
let sdatacpy = Arc::clone(&sdata);
pool.execute(move || {
do_calc(sdatacpy, i + n);
});
}
pool.join();
}
Arc::try_unwrap(sdata).unwrap()
}
fn do_calc(data: Arc<Data>, i: usize) { //cannot derefence mut from Arc
let mut pivot_val;
let num_threads = data.num_threads;
for j in (i + 1..data.matrix.len()).step_by(num_threads) {
pivot_val = data.matrix[j][i];
data.matrix[j][i] = 0.0;
for k in i + 1..data.matrix.len() {
data.matrix[j][k] -= pivot_val * data.matrix[i][k];
}
data.b[j] -= pivot_val * data.b[i];
}
}
I have thought about using a Mutex to exclusivise the matrix, but that would sequentialize the work.
My next thought is to treat the matrix as a vector of mutex's, each containing a row. This would allow me to divide up the work. matrix would be:
Vec<Mutex<Vec<f64>>>
However, I am wondering if this is the correct path. I would like to avoid using unsafe pointer references (I could just use my C version) while continuing to use a paradigm which has mpsc::channel's (no rayon or other library that doesn't use channels).