Write to a HashMap in parallel iterator

mthelm · April 29, 2021, 2:55pm

Here's a MWE:

use std::collections::HashMap;
use rayon::prelude::*;

#[derive(Clone, Debug, Eq, Hash, PartialEq)]
struct Car {
    year: u32,
    make: String,
    model: String
}

fn main() {
    let cars: Vec<Car> = vec![
        Car {
            year: 2021, make: "Honda".to_string(), model: "Accord".to_string()
        },
        Car {
            year: 1995, make: "Ford".to_string(), model: "F-150".to_string()
        },
        Car {
            year: 2021, make: "Honda".to_string(), model: "Civic".to_string()
        },
        Car {
            year: 2000, make: "Ford".to_string(), model: "Mustang".to_string()
        },
        Car {
            year: 1995, make: "Peugeot".to_string(), model: "405".to_string()
        }
    ];
    
    let mut siblings: HashMap<Car, Vec<Car>> = HashMap::new();
    
    cars.iter()
        .enumerate()
        .for_each(|(i, car)| {
            for j in i+1..cars.len() {
                if car.make == cars[j].make {
                    if siblings.contains_key(&car) {
                        let v = siblings.get_mut(&car).unwrap();
                        v.push(cars[j].clone())
                    } else {
                        siblings.insert(car.clone(), vec![cars[j].clone()]);
                    }
                }
            }
        });
    
    println!("{:#?}", siblings);
}

This works just fine. However, changing iter() to par_iter() results in an error which tells me that siblings cannot be borrowed as mutable since it's captured in a 'Fn' closure. I'm very inexperienced with Rust and some searching yielded some information about Arc and Mutex but I haven't figured out how to get that to work.

I would greatly appreciate some guidance as to how to parallelize the iterator (and eventually the inner loop as well ). Any other feedback to improve this bit of code is also very welcome!!

kornel · April 29, 2021, 3:03pm

This is because running iterator in parallel would access siblings from multiple threads at the same time, and that is dangerous, and risks threads overwriting each other's operations.

Your task is not well-suited for being run in parallel. You could add a Mutex around siblings, but Mutex removes parallelism. It makes all threads wait, and allows only one thread at a time to run. Therefore, you'd run parallel iterator running a serial operation. A waste of effort.

If you can, leave it single-threaded.

If you really need to make it faster, you'll have to redesign the process to collect into many separate siblings HashMaps first — one per thread — and later merge the HashMaps together (similar to map-reduce). This overall has higher overhead and uses more CPU, but may be faster in wall-clock time if you have millions of objects to merge like that.

mthelm · April 29, 2021, 4:10pm

Thank you for the explanation and advice, I appreciate it! I have a single-threaded version working but for my real problem I have a couple hundred thousand structs (cars, in this toy example) that I need to iterate over and it's going to take hours to finish. I'll look into creating separate HashMaps and then map-reducing them into a single HashMap at the end

Hyeonu · April 29, 2021, 5:16pm

The MapReduce frameworks typically runs the Map in parallel but runs the Reduce sequentially. So it's necessary to do all the heavy things in the Map part, leave the Reduce part as lightweight as possible.

The .for_each() here effectively act as the Reduce function. That's why it can't be parallelized easily.

cuviper · April 29, 2021, 6:25pm

Another option is to switch to a concurrent hash map, like dashmap. It even has native rayon support, including FromParallelIterator for collect(), but that literally just uses insert.

system · July 28, 2021, 6:26pm

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.

Topic		Replies	Views
Iterator with shared mutable hash map help	4	233	April 29, 2024
How do I create HashMap<&str, Cow<str>> from HashMap<String, String> using iter().map() help	4	2079	July 28, 2020
How to iterate a vector twice with mutable and immutable borrow? help	7	1070	October 24, 2023
[Solved] Noob question: Book section about HashMap help	7	692	August 29, 2020
IterMut and lifetime conflict with HashMap help	7	485	June 1, 2021

Write to a HashMap in parallel iterator

Related topics