Reason for the speed difference for mpsc vs arc+mutex

This is an exercise from exercism. I previously completed it with mpsc and I am re-attempting the problem with Arc and Mutex after watching a tutorial here https://www.youtube.com/watch?v=2mwwYbBRJSo. Just out of curiosity, why is the Arc+Mutex version slower?

arc+mutex version

use std::collections::HashMap;
use std::sync::mpsc;
use std::sync::{Arc, Mutex};
use std::thread;

pub fn frequency(input: &[&str], worker_count: usize) -> HashMap<char, usize> {
    let result = Arc::new(Mutex::new(HashMap::<char, usize>::new()));

    input
        .chunks((input.len() as f64 / worker_count as f64).ceil() as usize)
        .enumerate()
        .map(|(_, chunk)| {
            let chunk = chunk
                .iter()
                .map(|&sentence| String::from(sentence))
                .collect::<Vec<String>>();

            let rresult = result.clone();

            thread::spawn(move || {
                chunk.iter().for_each(|sentence| {
                    sentence
                        .to_lowercase()
                        .chars()
                        .filter(|x| x.is_alphabetic())
                        .for_each(|char| {
                            rresult
                                .lock()
                                .unwrap()
                                .entry(char)
                                .and_modify(|e| *e += 1)
                                .or_insert(1);
                        });
                })
            })
        })
        .for_each(|handle| handle.join().unwrap());

    Arc::try_unwrap(result).unwrap().into_inner().unwrap()
}

with the benchmark provided by exercism

jeffrey04@NOBITA-UBUNTU:exercism/rust/parallel-letter-frequency is 📦 v0.0.0 via 🦀 v1.52.1 [I]  rustup run nightly cargo bench                                                                                                                                                   
   Compiling parallel-letter-frequency v0.0.0 (/home/jeffrey04/Projects/exercism/rust/parallel-letter-frequency)                                                                                                                                                                  
    Finished bench [optimized] target(s) in 1.28s                                                                                                                                                                                                                                 
     Running unittests (target/release/deps/parallel_letter_frequency-17de0f2131b2cbf0)                                                                                                                                                                                           
                                                                                                                                                                                                                                                                                  
running 0 tests                                                                                                                                                                                                                                                                   
                                                                                                                                                                                                                                                                                  
test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s                                            
                                                                    
     Running unittests (target/release/deps/benchmark-78e9a2958cadb479)                                                                  
                                                                    
running 6 tests                                                                                                                          
test bench_large_parallel   ... bench:     951,516 ns/iter (+/- 129,201)                                                                 
test bench_large_sequential ... bench:     471,087 ns/iter (+/- 6,666)                                                         
test bench_small_parallel   ... bench:     102,609 ns/iter (+/- 9,290)                                                                   
test bench_small_sequential ... bench:      16,351 ns/iter (+/- 169)                                                                     
test bench_tiny_parallel    ... bench:      24,885 ns/iter (+/- 2,674)                                                                   
test bench_tiny_sequential  ... bench:          56 ns/iter (+/- 1)

test result: ok. 0 passed; 0 failed; 0 ignored; 6 measured; 0 filtered out; finished in 19.24s

and the mpsc version

pub fn frequency(input: &[&str], worker_count: usize) -> HashMap<char, usize> {
    if input.len() == 0 {
        return HashMap::new();
    }

    let (tx, rx) = mpsc::channel();

    input
        .chunks((input.len() as f64 / worker_count as f64).ceil() as usize)
        .for_each(move |chunk| {
            let ttx = mpsc::Sender::clone(&tx);
            let chunk = chunk.iter().map(|&x| String::from(x)).collect::<Vec<_>>();

            thread::spawn(move || {
                chunk.iter().for_each(|sentence| {
                    ttx.send(
                        sentence
                            .to_lowercase()
                            .chars()
                            .filter(|x| x.is_alphabetic())
                            .fold(HashMap::new(), |mut current, incoming| {
                                *current.entry(incoming).or_insert(0) += 1;
                                current
                            }),
                    )
                    .unwrap()
                });
            });
        });

    rx.iter().fold(HashMap::new(), |result, incoming| {
        incoming
            .into_iter()
            .fold(result, |mut current, (key, count)| {
                *current.entry(key).or_insert(0) += count;
                current
            })
    })
}

with benchmark

jeffrey04@NOBITA-UBUNTU:exercism/rust/parallel-letter-frequency is 📦 v0.0.0 via 🦀 v1.52.1 [I]  rustup run nightly cargo bench

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

     Running unittests (target/release/deps/benchmark-78e9a2958cadb479)

running 6 tests
test bench_large_parallel   ... bench:     468,427 ns/iter (+/- 77,755)
test bench_large_sequential ... bench:     563,890 ns/iter (+/- 6,348)
test bench_small_parallel   ... bench:      67,903 ns/iter (+/- 11,414)
test bench_small_sequential ... bench:      19,147 ns/iter (+/- 233)
test bench_tiny_parallel    ... bench:      25,554 ns/iter (+/- 2,536)
test bench_tiny_sequential  ... bench:          56 ns/iter (+/- 10)

test result: ok. 0 passed; 0 failed; 0 ignored; 6 measured; 0 filtered out; finished in 12.98s

What does the sequential benchmark do? Is there a pre-implemented single-threaded version for comparison?

The benchmarking script is here, apparently they have a simple single-threaded version for comparison

In the Arc+Mutex version, you are repeatedly locking the mutex around the hash map, for the insertion of every single character. Meanwhile in the MPSC version, you are folding with the HashMap directly, so locking happens much less frequently, namely only when you send something through the channel.

2 Likes