Optimize array sums

ManuelCostanzo · October 9, 2020, 1:50am

Hello !

I need to sum each element of the array a with each element of b. My first attempt is this:

let sums = a
        .iter()
        .flat_map(|ik| b.iter().map(move |kj| ik + kj))
        .collect::<Vec<_>>();

Do you have any idea how to improve performance on this?

Thank you !

cuviper · October 9, 2020, 2:08am

It would help to pre-allocate the full capacity, because the iterator can't predict the number of items from flat_map.

ManuelCostanzo · October 9, 2020, 2:10am

a and b are slices with N elements. They were allocated before. Why the iterator could't predict ?

cuviper · October 9, 2020, 2:11am

Because the iterator doesn't know what that closure is doing, only the type that it returns.

ManuelCostanzo · October 9, 2020, 2:23am

With this happens the same, I think that this could be improved

    let mut sums = Vec::with_capacity(block_length);
    sums = a
        .iter()
        .flat_map(|ik| b.iter().map(move |kj| ik + kj))
        .collect();

ManuelCostanzo · October 9, 2020, 2:23am

Perhaps with an external crate ?

cuviper · October 9, 2020, 2:48am

The collect creates a new collection, throwing away your reserved one. Instead, try sums.extend(a.iter().flat_map(...)).

scottmcm · October 9, 2020, 3:33am

You might be interested in itertools::iproduct - Rust

ManuelCostanzo · October 9, 2020, 10:31am

With extend it's the same as before.

I tried with iproduct! but is is extremely slow, I don't know why

2e71828 · October 9, 2020, 10:38am

Are you building in --release mode? The default is debug mode, which can be orders of magnitude slower.

ManuelCostanzo · October 9, 2020, 3:18pm

Yes ! With features for avx512

elsuizo · October 9, 2020, 4:25pm

and with this?

extern crate num;
use num::Num;

fn sum_slices<T: Num + Copy>(slice1: &[T], slice2: &[T]) -> Vec<T> {
    slice1.iter().zip(slice2).map(|(&a, &b)| a + b).collect()
}

fn main() {
    let a = [1, 2, 3, 4, 5, 10, 10];
    let b = [2, 3, 45, 6, 6, 10, 10];

    let sum = sum_slices(&a, &b);
    println!("sum: {:?}", sum);

}

Vincent_0 · October 9, 2020, 6:17pm

It seems that the compiler can not vectorize the flat_map, but you can use a for loop instead.

use std::ops::Add;
use std::default::Default;

pub fn array_sums<T: Default+Copy+Add<Output=T>>(a: &[T], b: &[T]) -> Vec<T> {
    let mut sums = vec![Default::default(); a.len()*b.len()];
    for (chunk,&a) in sums.chunks_mut(b.len()).zip(a.iter()) {
        for  (sum,&b) in chunk.iter_mut().zip(b.iter()) {
            *sum = b + a;
        }
    }
    sums
}

ManuelCostanzo · October 9, 2020, 6:54pm

I think my problem is that I have big arrays (like 8388608 positions), so, I have my arrays and if I create another variable I am collapsing the use of memory. I think I should use it as iterators, without the need to save space for the sums, do it as "lazy", but I do not know if it is possible.

ManuelCostanzo · October 9, 2020, 7:02pm

My goal is to turn this into efficient iterators.

for k in 0..BLOCK_SIZE {
    for i in 0..BLOCK_SIZE {
        for j in 0..BLOCK_SIZE {
            let sum = kj[k * BLOCK_SIZE + j] + ik[i * BLOCK_SIZE + k];
            let cell = &mut ij[i * BLOCK_SIZE + j];
            if sum < *cell {
                *cell = sum;
            }
        }
    }
}

alanhkarp · October 9, 2020, 8:31pm

Have you looked at cache behavior? In your loops, ij and kj are accessed at stride 1, but ik is accessed at stride BLOCKSIZE. The compiler should compute it once in the i loop, so the large stride might not be an issue, but I'd check to make sure.

ManuelCostanzo · October 9, 2020, 8:54pm

The code with 3 fors works well, but I think that it isn't rust idiomatic. I feel that this could be improved a lot using the correct iterators.

alanhkarp · October 9, 2020, 8:59pm

Iterators are nice, but there's no reason to force it. Quite often explicit loops are easier to understand.

ManuelCostanzo · October 9, 2020, 9:04pm

But what about bounding checks ? With iterators disappear, or not ?

alanhkarp · October 9, 2020, 9:09pm

It looks like BLOCKSIZE is a const, so the compiler should be able to skip the checks. Your should verify that, of course.

Topic		Replies	Views
Sum all values in a matrix help	10	2506	August 19, 2020
Flat mapping arrays	5	408	July 29, 2021
Double iterator	5	1374	January 27, 2021
How to get out of this "move" hell? help	4	590	May 27, 2022
Collect/Concatenate collection of [T;N] into Vec<T> help	5	580	November 15, 2019

Optimize array sums

Related Topics