Problem with mutation in closures

ManuelCostanzo · May 20, 2020, 4:09pm

Hello, i'm having a problem with the following code:

fn floyd_version_parallel(graph: &mut Array2<f32>) {
for k in 0..MAX {
    (0..NUM_THREADS).into_par_iter().for_each(|id| {
        for i in
            (id * (MAX / NUM_THREADS))..(id * (MAX / NUM_THREADS) + (MAX / NUM_THREADS) - 1)
        {
            for j in 0..MAX {
                unsafe {
                    let sum = *graph.uget((i, k)) + *graph.uget((k, j));
                    if sum < *graph.uget_mut((i, j)) {
                        *graph.uget_mut((i, j)) = sum;
                    }
                }
            }
        }
    });
}
}

In this code i'm using ndarray and rayon. I want to paralelize the compute so i made an into_par_iter. My problem is ** graph.uget_mut((i, j))* because i need a muted cell, that it's the following:

cannot borrow *graph as mutable, as Fn closures cannot mutate their captured variables

I could make a Mutex for all the graph, but i only need to modify one cell per time (i,j). The other cells (i,k and k,j) i use for read only.

How can i solve this ? Thank you so much !

red75prime · May 20, 2020, 7:27pm

Floyd's algorithm cannot be parallelized in such a manner. This Wikipedia link has a scheme of what needs to be done.

alice · May 20, 2020, 7:38pm

To safely share data with rayon, any region of memory that you wish to mutate from multiple threads must be shared by having rayon's par_iter construct (or similar) split it, which makes it available as an argument to the closure. You can't just move the &mut T into the closure, because a mutable reference requires exclusive access to what it points at, and moving it to multiple threads normally would violate that.

For example, to add five to every element in an array, you must do this:

let mut v = vec![1, 2, 3, 4];

v.par_iter_mut()
    .for_each(|x| {
        *x += 5;
    });

playground

This will not compile:

let mut v = vec![1, 2, 3, 4];

(0..v.len()).into_par_iter()
    .for_each(|i| {
        v[i] += 5;
    });

playground

The difference is that the first version ensures that each call to the closure has access only to that specific item in the vector. This means that there aren't several threads with &mut T references that overlap, which is the issue in the second version where every spawned task has a mutable reference to the vector, and those all overlap with each other.

ManuelCostanzo · May 20, 2020, 8:21pm

@alice hanks for the response.

I understand what you say. My problem is that i need to do something similar than your second example. Using Mutex in all of the graph worked, but works like 4 times worse than secuencial algorithm. How can i remove the overhead of locks ?

alice · May 20, 2020, 8:27pm

Parallelizing algorithms with data dependencies criss-crossing like that can be very difficult. Take a look at the wiki link above, which talks about some approaches to do it.

ManuelCostanzo · May 20, 2020, 9:18pm

@alice Yes, you are right, but for example: my secuencial version takes 600ms and my parallel version 3 seconds. And that is cause by the .lock.

I know that i will not get a correct result, but it's just an example:

pub struct MyData {
    data: Array2<f32>,
}

impl MyData {
    // Another static method, taking two arguments:
    fn new(data: Array2<f32>) -> MyData {
        MyData { data: data }
    }
}

fn floyd_version_parallel(graph_aux: &mut Array2<f32>) {
    let graph = Mutex::new(MyData::new(graph_aux.clone()));
    for k in 0..MAX {
        (0..NUM_THREADS).into_par_iter().for_each(|id| {
            let mut graph = graph.lock().unwrap();
            for i in
                (id * (MAX / NUM_THREADS))..(id * (MAX / NUM_THREADS) + (MAX / NUM_THREADS) - 1)
            {
                for j in 0..MAX {
                    unsafe {
                            let sum = *graph.data.uget((i, k)) + *graph.data.uget((k, j));
                            if sum < *graph.data.uget((i, j)) {
                                *graph.data.uget_mut((i, j)) = sum;
                            }
                    }
                }
            }
        });
    }
}

This versions takes at least 2 seconds more than the following secuencial version:

fn floyd_version(graph: &mut Array2<f32>) {
let mut sum;
for k in 0..MAX {
    for i in 0..MAX {
        for j in 0..MAX {
            unsafe {
                sum = *graph.uget((i, k)) + *graph.uget((k, j));
                if sum < *graph.uget_mut((i, j)) {
                    *graph.uget_mut((i, j)) = sum;
                }
            }
        }
    }
}

}

What i'm doing wron ? Thank you !

system · August 18, 2020, 9:19pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Cannot borrow `graph` as immutable because it is also borrowed as mutable help	4	551	August 19, 2020
Rayon is slower than serial algorithm	15	1973	October 30, 2020
Rayon par_iter_mut slower than serial help	4	1136	November 23, 2020
Cannot borrow as mutable, as it is a captured variable in a `Fn` closure	8	160	March 30, 2024
Parallel reference processing with mutation	11	762	September 17, 2021

Problem with mutation in closures

Related Topics