Rayon's parallel iterators work great when each calculation is independent from every other calculation. For example, we can multiply each number in an array by two and then sum()
the products, where summation is an example of an associative operation we can use with reduce_with()
.
let data: Vec<i32> = vec![1, 2, 3, 4];
let result: i32 = data.par_iter().map(|x| x * 2).sum();
assert!(result == 1*2 + 2*2 + 3*2 + 4*2);
This gets to be a little trickier if we need some sequential dependency between calculations. Suppose we want to multiply each number by two, then take the difference with the previous product, and then sum the results? This is sort of operation is very common, for example, when computing a derivative or convolving with some kernel.
Solution: There is no way to express this kind of sequential dependency in Rayon as of now. There is a feature request to convert parallel iterators to sequential iterators without
collect()
that would make this possible, but it dates back to 2017 and won't likely me merged any time soon.If your first operation is distributive (scalar multiplication is) and you have a slice of the data you want to work on already in memory:
let data: Vec<i32> = vec![1, 2, 3, 4]; let result: i32 = data.par_windows(2) // requires a slice .map(|v| 2 * (v[1] - v[0])) .sum();
Or if you don't have the slice in memory you can use a sequential iterator like
Itertools::tuple_windows()
to feed a parallel iterator:let data: Vec<i32> = vec![1, 2, 3, 4]; let data = data.into_iter(); // don't need all data in memory at once let result: i32 = data.tuple_windows::<(_,_)>() .par_bridge() .map(|(x, y)| 2 * (y - x)) .sum();
If you can't distribute the first operation or you don't have all the data in memory then it's about trade-offs. If you're memory constrained then be prepared to do the first op serially or redundantly in parallel:
// Serial scalar multiplication, then parallel reduction. data.tuple_windows::<(_,_)>().map(|x| x*2).par_bridge().map(|(x,y)| y-x).sum(); // Everything's parallel, but multiplication done twice. data.tuple_windows::<(_,_)>().par_bridge().map(|x,y| (2*x,2*y)).map(|(x,y)| y-x).sum();
If the first op is computationally expensive and you're not memory constrained then
collect()
partway through like this:let data: Vec<i32> = data.par_iter().map(|x| x*2).collect(); let result: i32 = data.par_windows(2).map(|v| 2 v[1] - v[0]).sum();
See also https://users.rust-lang.org/t/parallel-work-collected-sequentially/13504.
Thanks to @dthul for pointing out
ParallelSlice::par_windows()
and @drewkett for pointing outItertools::tuple_windows()
, which make these concise workarounds possible. The rest of my original post continues below for posterity.
The obvious (to me) solution is to perform the first operation (multiplication) in parallel, then collect the results, and then perform the dependent (difference) operation in parallel:
let data: Vec<i32> = vec![1, 2, 3, 4];
let products: Vec<i32> = data.par_iter().map(|x| x * 2).collect();
let result: i32 = products.iter().zip(products.iter().skip(1)).par_bridge().map(|(x,y)| y - x).sum();
assert!(result == (2*2 - 1*2) + (3*2 - 2*2) + (4*2 - 3*2));
However, imagine the data are very long and actually come from a lazily-evaluated iterator instead of a vector. Collecting them into the intermediate products
vector could cost a lot of memory! Is there a more efficient/idiomatic way to achieve this sort of non-associative reduction, or to deal with sequential dependencies between parallel calculations in general?