Correct way of collecting rayon 2d computations

I'm computing distances between vectors of strings and using rayon to speed up that computation since I may have ~10^8 comparisons. Of course, I imagine that collecting that many results will take a few seconds; however, I'm new to Rust and want to ensure that I'm collecting the result correctly since it appears that the latency of the function is in collecting the result.

An idea of what I'm doing appears as follows:

use once_cell::sync::Lazy;
use rayon::prelude::*;

pub static POOL: Lazy<rayon::ThreadPool> = Lazy::new(|| {
    rayon::ThreadPoolBuilder::new()
        .num_threads(
            std::thread::available_parallelism()
                .unwrap_or(std::num::NonZeroUsize::new(1).unwrap())
                .get(),
        )
        .thread_name(move |i| format!("{}-{}", "tcrdist", i))
        .build()
        .expect("could not spawn threads")
});

pub fn compute_distances(seqs1: Vec<&str>, seqs2: Vec<&str>) -> Vec<u16> {
    POOL.install(|| {
        seqs1
            .par_iter()
            .flat_map(|&s1| {
                seqs2
                    .iter()
                    .map(|&s2| dist_func(s1, s2))
                    .collect::<Vec<u16>>()
            })
            .collect()
    })
}

Should I be using the two collects or just one somehow?

The actual code is found here.

Thanks in advance!

flat_map_iter without the inner collect should do a little better, so you're not creating a temporary vector only to uselessly parallelize it again.

        seqs1
            .par_iter()
            .flat_map_iter(|&s1| {
                seqs2
                    .iter()
                    .map(|&s2| dist_func(s1, s2))
            })
            .collect()

But even that is a little inefficient, because Rayon can't "see" what the total size will be in order to place each entry in its final location. Instead it will still have its own temporary vectors for each "job", to be merged at the end of the process.

You can trick this into an indexed collect with a little bit of manual work, like:

    (0..seqs1.len() * seqs2.len())
        .into_par_iter()
        .map(|i| {
            let s1 = &seqs1[i / seqs2.len()];
            let s2 = &seqs2[i % seqs2.len()];
            dist_func(s1, s2)
        })
        .collect()

Then Rayon knows exactly where each map item should go, and it will collect directly into the result Vec.

3 Likes

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.