 # Parallel Gaussian elimination

I am trying to parallelize an implementation of Gaussian elimination. I am aware that crates such as rayon may make this easier, but my goal is to use a mpsc::channel based implementation, so I chose to use a threadpool.

Here is an overview of the code base. Some of the features are redundant, but this code was ported from C, so I tried to maintain the structure.

The data structure passed around contains the size of the matrix (redundant), the matrix, four verification vectors (used to determine correctness), and the number of threads:

``````pub struct Data {
pub nsize: usize,
pub matrix: Vec<Vec<f64>>,
pub b: Vec<f64>,
pub c: Vec<f64>,
pub v: Vec<f64>,
pub swap: Vec<u64>,
}
``````

The sequential algorithm:

``````pub fn compute_gauss(data: &mut Data) {
for i in 0..data.nsize {
pivot(data, i);

let mut pivot_val;

for j in i + 1..data.nsize {
pivot_val = data.matrix[j][i];
data.matrix[j][i] = 0.0;
for k in i + 1..data.nsize {
data.matrix[j][k] -= pivot_val * data.matrix[i][k];
}
data.b[j] -= pivot_val * data.b[i];
}
}
}
``````

the `pivot(matrix,index)` function mutates the matrix in data

My current implementation of parallel gauss:

``````pub fn compute_gauss_p(data: Data) -> Data {
let nsize = data.nsize;
let sdata = Arc::new(data);

for i in 0..nsize {
pivot(&mut *sdata, i); //compile error, as Arc does not implement DerefMut
let sdatacpy = Arc::clone(&sdata);
pool.execute(move || {
do_calc(sdatacpy, i + n);
});
}
pool.join();
}

Arc::try_unwrap(sdata).unwrap()
}
``````
``````fn do_calc(data: Arc<Data>, i: usize) { //cannot derefence mut from Arc
let mut pivot_val;
for j in (i + 1..data.matrix.len()).step_by(num_threads) {
pivot_val = data.matrix[j][i];
data.matrix[j][i] = 0.0;
for k in i + 1..data.matrix.len() {
data.matrix[j][k] -= pivot_val * data.matrix[i][k];
}
data.b[j] -= pivot_val * data.b[i];
}
}
``````

I have thought about using a Mutex to exclusivise the matrix, but that would sequentialize the work.
My next thought is to treat the matrix as a vector of mutex's, each containing a row. This would allow me to divide up the work. matrix would be:

``````Vec<Mutex<Vec<f64>>>
``````

However, I am wondering if this is the correct path. I would like to avoid using unsafe pointer references (I could just use my C version) while continuing to use a paradigm which has mpsc::channel's (no rayon or other library that doesn't use channels).

My rough (untested) translation of this for rayon would look something like this:

``````pub fn compute_gauss(data: &mut Data) {
for i in 0..data.nsize {
pivot(data, i);

let (row_i, rows) = data.matrix[i..data.nsize].split_first_mut().unwrap();

rows.par_iter_mut().for_each(|row_j| {
let pivot_val = row_j[i];
row_j[i] = 0.0;
for k in i + 1..data.nsize {
row_j[k] -= pivot_val * row_i[k];
}
data.b[j] -= pivot_val * data.b[i];
});
}
}
``````

You probably need to pull out references to those other `data` members too before the parallel loop to avoid borrowing conflicts in the closure. Anyway, the idea is to split the mutable part so it can work independently, which you can also do with other threadpool implementations.

edit: I missed that `data.b` is also modified -- definitely need to fix that, but you can split it the same way as the matrix and use a parallel `zip`.

I would try to split the data that the different threads have to access in contiguous blocks, then use `split_at_mut` and `chunks_mut` to actually obtain them and send them to the different threads.

This however requires `ThreadPool::execute` to be able to accept non-`'static` closures, which doesn't seem the case. You need something like `rayon::scope` or `crossbeam::scope` for this, but at that point you may as well use `rayon`'s `ParallelIterator`.

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.