I'm computing distances between vectors of strings and using rayon to speed up that computation since I may have ~10^8 comparisons. Of course, I imagine that collecting that many results will take a few seconds; however, I'm new to Rust and want to ensure that I'm collecting the result correctly since it appears that the latency of the function is in collecting the result.
An idea of what I'm doing appears as follows:
use once_cell::sync::Lazy;
use rayon::prelude::*;
pub static POOL: Lazy<rayon::ThreadPool> = Lazy::new(|| {
rayon::ThreadPoolBuilder::new()
.num_threads(
std::thread::available_parallelism()
.unwrap_or(std::num::NonZeroUsize::new(1).unwrap())
.get(),
)
.thread_name(move |i| format!("{}-{}", "tcrdist", i))
.build()
.expect("could not spawn threads")
});
pub fn compute_distances(seqs1: Vec<&str>, seqs2: Vec<&str>) -> Vec<u16> {
POOL.install(|| {
seqs1
.par_iter()
.flat_map(|&s1| {
seqs2
.iter()
.map(|&s2| dist_func(s1, s2))
.collect::<Vec<u16>>()
})
.collect()
})
}
Should I be using the two collects or just one somehow?
The actual code is found here.
Thanks in advance!