How to optimize this code pattern with rayon?

Fomalhauthmj · October 18, 2023, 1:02pm

Hi everyone! I have simplify my problem into the following code example.
Specifically, due to the exist of index（mapping）, I need another loop to aggregate the results from parallel iterator.
Can we avoid the aggregation loop with rayon or other methods?

use rayon::prelude::*;
use std::collections::HashMap; // 1.7.0
fn main() {
    let index = HashMap::from([(0, 0), (1, 1), (2, 0), (3, 1)]);
    let results: Vec<(usize, usize)> = (0..4).into_par_iter().map(|i| (index[&i], i * i)).collect();
    let mut boxed = vec![0; 2];
    for (index, delta) in results {
        boxed[index] += delta;
    }
    println!("{:?}", boxed);
    return;
}

tristan · October 18, 2023, 1:59pm

Not without some kind of synchronization of the stored results (e.g. using a Mutex or some Atomic type), in which case you probably need to do some benchmarking of your real world data to see if it actually makes sense. It might be the case that adding this synchronization actually makes things slower than your current solution.

Here's an example using Atomics to store the results.

use rayon::prelude::*;
use std::sync::atomic::{AtomicU32, Ordering};
use std::collections::HashMap; 
fn main() {
    let index = HashMap::from([(0, 0), (1, 1), (2, 0), (3, 1)]);
    let boxed = vec![AtomicU32::new(0), AtomicU32::new(0)];
    (0..4).into_par_iter().for_each(|i| {
        let index = index[&i];
        boxed[index].fetch_add(i * i, Ordering::SeqCst);
    });
    
    println!("{:?}", boxed);
    return;
}

steffahn · October 18, 2023, 4:16pm

Looks like you should be able to do some aggregation of parts of the sequence using .fold(…), then you can reduce them down to a single one afterwards - both still parallelized.

use rayon::prelude::*;
use std::collections::HashMap;
fn main() {
    let index = HashMap::from([(0, 0), (1, 1), (2, 0), (3, 1)]);
    let boxed = (0..4)
        .into_par_iter()
        .map(|i| (index[&i], i * i))
        .fold(
            || vec![0; 2],
            |mut boxed, (index, delta)| {
                boxed[index] += delta;
                boxed
            },
        )
        .reduce_with(|mut boxed1, boxed2| {
            std::iter::zip(&mut boxed1, &boxed2).for_each(|(i, j)| *i += *j);
            boxed1
        })
        .unwrap_or_else(|| vec![0; 2]);
    println!("{:?}", boxed);
}

system · January 16, 2024, 4:17pm

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.

Topic		Replies	Views
Parallelization advice help	7	627	March 3, 2021
Using rayon to implement some common parallel patterns help	2	1701	December 7, 2022
[solved] Iterating a normal iterator with rayon help	2	1007	January 12, 2023
Parallel execution of nested loop and collection of multiple occurences of values help	8	2160	June 13, 2021
Rayon par_iter() is always slower than iter()	5	1830	December 25, 2021

How to optimize this code pattern with rayon?

Related topics