How can I use the crate Rayon to multiply a matrix by a vector?

Hello, I am a beginner in rust and I would like to make faster the multiplication of a matrix by a vector with the crate Rayon.

Here is the original code:

fn multiply(matrix: Vec<Vec<f64>>, vector: Vec<f64>) -> Vec<f64> {

    let mut result: Vec<f64> = vec![0.0; matrix.len()];
    
    for i in 0..= matrix.len() - 1 {
        let mut x: f64 = 0.0;
        for j in 0..= vector.len() - 1 {
            x = (vector[j] * matrix[i][j]) + x;
        }
        result[i] = x;
    }
    return result;
}

I have many difficulties using Rayon to make this code work in parallel, here one of my unsuccessful attempts:

fn sum_of_squares(matrix: &Vec<Vec<f64>>, vector: &Vec<f64>) -> Vec<f64> {
    matrix.par_iter().for_each(
        |matrix: Vec<f64>|
            vector.par_iter()
                .map(|&j| j * matrix[j])
                .sum()
            )
}

I would be grateful if someone could help me to understand how to deal with vectors and matrices using Rayon.

If you care about performance, you should look for a parallel blas library.

If this is for educational purposes, you want to store the matrix as a vector of rows. Then, each of the '.par_iter().map(...)' computes a dot product, which you collect into a Vec<f32>

Don't use Vec<Vec. This is not a 2d matrix. It's a jagged array, and it adds cost of indirection and extra redundant data for each row. You should use 1d Vec (or fixed-length array) indexed by [row * size + column] instead. Then you can use onedimvec.chunks_exact(size) to get an iterator of rows. You can't get an iterator of columns a without writing a custom unsafe iterator.

Rayon needs the tasks to be big. In my tests anything below 10000 operations is slower with rayon, because cost of launching a task is larger than savings from parallelizing it. Use it only if your matrices are very big, otherwise it won't make sense.

I recommend using existing crates for this, maybe this:

or

Hello, thanks Kornel for your help, I applied what you said about using only one vector, and this is what I did:

pub fn multiply(matrix: &Vec<f64>, vector: &Vec<f64>) -> Vec<f64> {

    let result_lenght: usize = matrix.len() / vector.len();
    let mut result: Vec<f64> = vec![0.0; result_lenght];
    let mut counter: usize = 0;
    
    for i in 0..= result_lenght - 1 {
        let mut x: f64 = 0.0;

        for j in 0..= vector.len() - 1 {
            x = (&vector[j] * &matrix[counter]) + x;
            counter = counter + 1;
        }

        result[i] = x;
    }
    return result;
}

Here I folded the vector who is the matrix into segments with the length of the vector in order to create the rows.

Since I have a counter, I don't think making the calculations in parallel will be a good idea, I could delete the counter in the columns by making:

for i in 0..= result_lenght - 1 {

    let mut x: f64 = 0.0;

    for j in 0..= vector.len() - 1 {
        x = (&vector[j] * &matrix[counter + j]) + x;
    }

    counter = counter + vector.len();
    result[i] = x;
}

But that option adds one extra calculation per row and to be very useful, in parallel, I would need, as you said, 10 000 elements per row, which will never happen.

This is the best I can do... I am sorry for not applying the other things you said, but I am keeping them for later.

I want to thank you too, zeroexcuses, for taking the time to answer me.

1 Like

Just to be clear, I fully agree with @kornel's statement here. My comment above was meant to say:

Since you're already using Vec<Vec<_>>, you want to use a Vec of rows (rather than a Vec of columns). With a Vec of rows, you only have to do dot products in parallel; with Vec of columns, you had to do a Vector summation step in the end.

To be honest, with:

fn sum_of_squares(matrix: &Vec<Vec<f64>>, vector: &Vec<f64>) -> Vec<f64> {
    matrix.par_iter().for_each(
        |matrix: Vec<f64>|
            vector.par_iter()
                .map(|&j| j * matrix[j])
                .sum()
            )
}

I don't know how to deal with the |matrix: Vec| as an iterator and I am more confused about how I could store the result in a vector.

Since I started rust less than one year, I prefer to have the most algoritmique way of thinking to avoid having to deal with things I don't fully understand.

That is also the reason I didn't use the crates that Kornel gave me... I have a lot to learn ^^'' !

Disclaimer: not tested:

fn sum_of_squares(matrix: &Vec<Vec<f64>>, vector: &Vec<f64>) -> Vec<f64> {
    matrix.par_iter().map(
        |row: Vec<f64>|
            row.iter().enumerate()
                .map(|(idx, &j|) j * vector[idx])
                .sum()
            ).collect::<Vec<_>()
}

The three changes are

  1. for_each -> map and
  2. adding a collect::<Vec<_>() to collect the dot products into a Vec.
  3. also, the inner map, for dot product, does not need a par_iter; since we already have each dot product in it's own thread

Again, this is all for educational purposes to demonstrate how to use rayon; for production, use a matrix library.

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.