Why `Iter` is slower than `loop` in this situation?

Bria · September 6, 2023, 1:18am

I'm trying to implement a custom iterator in a struct as following:

#[derive(Clone)]
pub struct MovingWindow<'a, T, N> {
    slice: &'a [T],
    n: N,
    count: usize,
    init_size: usize
}

impl<'a, 'b, T> Iterator for MovingWindow<'a, T, &'b Vec<usize>> {
    type Item = &'a [T];

    #[inline]
    fn next(&mut self) -> Option<Self::Item> {
        if self.count <= self.init_size {
            let i = self.count;
            let win = self.n[i - 1];
            let start_i = if i < win { 0usize } else { i - win };
            // println!("i: {i}, win: {win}, start_i: {start_i}");
            let ret = Some(&self.slice[start_i..i]);
            self.count += 1;
            ret
        } else {
            None
        }
    }
}

pub trait Rolling {
    type T;
    fn rolling<N>(&self, n: N) -> MovingWindow<Self::T, N>;
}

impl<N> Rolling for [N] {
    type T = N;
    fn rolling<K>(&self, n: K) -> MovingWindow<Self::T, K>
    {
        MovingWindow { slice: self, n, count: 1, init_size: self.len()}
    }
}

The iterator's Item is a Slice, the len of which depends on the MovingWindow's field n(a Vec<usize>).
And I have the following trait on [T], which would do the same thing as above but with using loop:

pub trait RollWindow: AsRef<[f32]> {
    fn roll_window_sum(&self, window: &[usize]) -> Vec<f32> {
        let data = self.as_ref();
        let mut res = vec![f32::NAN; data.len()];
        for (i, win) in window.iter().enumerate() {
            let start_i = if i + 1 < *win { 0 } else { i + 1 - win };
            let data_part = &data[start_i..i + 1];
            res[i] = data_part.iter().sum::<f32>()
        }
        res
    }
}
impl RollWindow for [f32] {}
impl RollWindow for Vec<f32> {}

I tested the performace of both methods as following:

let data1 = vec![1f32; 100_000_000];
let data2 = vec![10usize; 100_000_000];


t!(data1.rolling(&data2).map(|x| x.iter().sum::<f32>()).collect::<Vec<f32>>());//test the iter method
//344.632407ms

t!(data1.roll_window_sum(&data2));//test the loop method
//280.973004ms

This is the playground

I use both Iter(the standard iterator, not implementing by myself) and loop a lot, and I didn't find any significant performance difference between them. What am I missing about the implementing above that make the iter slow?

cccccccc · September 6, 2023, 2:56am

Do you use the release mode?

Bria · September 6, 2023, 2:57am

Yes, I test it with release mode.

danjl1100 · September 6, 2023, 3:01am

I would be suspicious of performing a benchmark this way. You may want to look into cargo-bench or some other benchmarking framework. These will run the test many, many times to try to average out the ambient system noise, and avoid the effect of "cold start" where the CPU cache may not be filled.

Not sure whether you'll see a large difference, but a quick first step could be to reverse the order of the tests, and run it a few times.

steffahn · September 6, 2023, 3:11am

This could be down to indexing into the &Vec<usize> based on indices being slower than going through it with a proper iterator. You could try saving a std::iter::Enumerate<std::slice::Iter<'_, usize> in your iterator too, instead of the &Vec<usize>, to see if that approach removes the difference in performance.

user16251 · September 7, 2023, 1:13am

Could it have to do with the collect? In the second function you preallocate the output.

Have you compared the assembly from both implementations?

Topic		Replies	Views
We all know `iter` is faster than `loop`, but why?	23	14341	February 16, 2021
Simple iter 5x slower than a for loop?	14	1071	November 12, 2020
Iterators faster than for loops in parallel help	3	718	August 9, 2020
Performance difference between iterator and for loop	8	4434	January 17, 2021
Why iterating over the iterator is 3x slower help	8	1109	October 7, 2021

Why `Iter` is slower than `loop` in this situation?

Related topics