 # Optimize array sums

Hello !

I need to sum each element of the array `a` with each element of `b`. My first attempt is this:

``````let sums = a
.iter()
.flat_map(|ik| b.iter().map(move |kj| ik + kj))
.collect::<Vec<_>>();
``````

Do you have any idea how to improve performance on this?

Thank you !

It would help to pre-allocate the full capacity, because the iterator can't predict the number of items from `flat_map`.

`a` and `b` are slices with N elements. They were allocated before. Why the iterator could't predict ?

Because the iterator doesn't know what that closure is doing, only the type that it returns.

1 Like

With this happens the same, I think that this could be improved

``````    let mut sums = Vec::with_capacity(block_length);
sums = a
.iter()
.flat_map(|ik| b.iter().map(move |kj| ik + kj))
.collect();``````

Perhaps with an external crate ?

The `collect` creates a new collection, throwing away your reserved one. Instead, try `sums.extend(a.iter().flat_map(...))`.

2 Likes

You might be interested in https://docs.rs/itertools/0.9.0/itertools/macro.iproduct.html

With `extend` it's the same as before.

I tried with `iproduct!` but is is extremely slow, I don't know why

Are you building in `--release` mode? The default is debug mode, which can be orders of magnitude slower.

Yes ! With features for avx512

and with this?

``````extern crate num;
use num::Num;

fn sum_slices<T: Num + Copy>(slice1: &[T], slice2: &[T]) -> Vec<T> {
slice1.iter().zip(slice2).map(|(&a, &b)| a + b).collect()
}

fn main() {
let a = [1, 2, 3, 4, 5, 10, 10];
let b = [2, 3, 45, 6, 6, 10, 10];

let sum = sum_slices(&a, &b);
println!("sum: {:?}", sum);

}
``````

It seems that the compiler can not vectorize the flat_map, but you can use a for loop instead.

``````use std::ops::Add;
use std::default::Default;

pub fn array_sums<T: Default+Copy+Add<Output=T>>(a: &[T], b: &[T]) -> Vec<T> {
let mut sums = vec![Default::default(); a.len()*b.len()];
for (chunk,&a) in sums.chunks_mut(b.len()).zip(a.iter()) {
for  (sum,&b) in chunk.iter_mut().zip(b.iter()) {
*sum = b + a;
}
}
sums
}``````

I think my problem is that I have big arrays (like 8388608 positions), so, I have my arrays and if I create another variable I am collapsing the use of memory. I think I should use it as iterators, without the need to save space for the sums, do it as "lazy", but I do not know if it is possible.

My goal is to turn this into efficient iterators.

``````for k in 0..BLOCK_SIZE {
for i in 0..BLOCK_SIZE {
for j in 0..BLOCK_SIZE {
let sum = kj[k * BLOCK_SIZE + j] + ik[i * BLOCK_SIZE + k];
let cell = &mut ij[i * BLOCK_SIZE + j];
if sum < *cell {
*cell = sum;
}
}
}
}``````

Have you looked at cache behavior? In your loops, `ij` and `kj` are accessed at stride 1, but `ik` is accessed at stride `BLOCKSIZE`. The compiler should compute it once in the `i` loop, so the large stride might not be an issue, but I'd check to make sure.

The code with 3 fors works well, but I think that it isn't rust idiomatic. I feel that this could be improved a lot using the correct iterators.

Iterators are nice, but there's no reason to force it. Quite often explicit loops are easier to understand.

2 Likes

But what about bounding checks ? With iterators disappear, or not ?

It looks like BLOCKSIZE is a const, so the compiler should be able to skip the checks. Your should verify that, of course.