How to “zip” two slices efficiently

cristicbz · August 17, 2015, 10:08am

I went on a similar exploration trying to improve dot product some time ago. I found that rustc isn't great at unrolling loops, so manually unrolling gives a signficant speed boost:

// I'd love it if this were the fastest version :(
fn naive_dot_product(x: &[f64], y: &[f64]) -> f64 {
    x.iter().zip(y.iter())
        .fold(0.0, |sum, (&ex, &ey)| sum + (ex * ey))
}

// The method you describe.
fn index_dot_product(x: &[f64], y: &[f64]) -> f64 {
    let n = cmp::min(x.len(), y.len());
    let (x, y) = (&x[..n], &y[..n]);
    let mut sum = 0.0;
    for i in 0..n {
        sum += x[i] * y[i];
    }
    sum
}

// Shift slices in place and add 8 elements at a time.
fn unrolled_dot_product(x: &[f64], y: &[f64]) -> f64 {
    let n = cmp::min(x.len(), y.len());
    let (mut x, mut y) = (&x[..n], &y[..n]);

    let mut sum = 0.0;
    while x.len() >= 8 {
        sum += x[0] * y[0] + x[1] * y[1] + x[2] * y[2] + x[3] * y[3]
             + x[4] * y[4] + x[5] * y[5] + x[6] * y[6] + x[7] * y[7];
        x = &x[8..];
        y = &y[8..];
    }

    // Take care of any left over elements (if len is not divisible by 8).
    x.iter().zip(y.iter()).fold(sum, |sum, (&ex, &ey)| sum + (ex * ey))
}

The results I get are (for arrays with ~~6339~~65339 and 65527 elements):

test bench_naive    ... bench:   2,910,196 ns/iter (+/- 82,582)
test bench_index    ... bench:   2,562,601 ns/iter (+/- 87,100)
test bench_unrolled ... bench:     987,744 ns/iter (+/- 39,174)

Even for smaller vecs (259 and 253), the speed up is significant:

test bench_naive    ... bench:      13,449 ns/iter (+/- 151)
test bench_index    ... bench:      11,869 ns/iter (+/- 97)
test bench_unrolled ... bench:       4,994 ns/iter (+/- 64)

Code available at: Shared via Rust Playground · GitHub

Topic		Replies	Views
Zip two finite iterators and pad the shorter one help	5	1262	January 12, 2023
Can I zip multiple iterators flattened? help	3	2879	February 23, 2022
How to iterate through two arrays at once?	20	47604	July 3, 2022
Performance difference between iterator zip and skip order	7	2365	January 12, 2023
Writing a (parallel) zip for a const generic number of (parallel) iterators help	2	617	July 21, 2021

How to “zip” two slices efficiently

Related topics