Simple iter 5x slower than a for loop?

Hi,

out of pure curiosity and coincidence, I've tried to benchmark the following simple example:

pub fn test_iter() -> Vec<usize> {
    (0..12)
        .filter(|x| x % 2 == 0)
        .map(|x| x * x)
        .collect()
}

pub fn test_for() -> Vec<usize> {
    let n = 12;
    let mut result = Vec::with_capacity(n);
    for x in 0..n {
        if x % 2 == 0 {
            result.push(x * x);
        }
    }
    result
}

To my surprise, the for loop performs 5x faster than the .iter() version on my Core-i7, rust 1.42.

$ cargo bench
   Compiling bench_iter v0.1.0 (/home/td/projects-sandbox/rust_bench_iter)
    Finished bench [optimized] target(s) in 3.47s
     Running target/release/deps/bench_iter-e0df5c1fe8074663

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out

     Running target/release/deps/my_benchmark-7c1f3cac9736b209
iter                    time:   [123.41 ns 123.51 ns 123.62 ns]                 
                        change: [-28.213% -28.099% -27.980%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
  1 (1.00%) low mild
  4 (4.00%) high mild
  1 (1.00%) high severe

for                     time:   [23.755 ns 23.777 ns 23.799 ns]                 
                        change: [+0.4685% +0.6315% +0.8007%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 4 outliers among 100 measurements (4.00%)
  1 (1.00%) high mild
  3 (3.00%) high severe

Is it me doing something stupid with the benchmark, or is actually Rust performing poorly for this particular case?

Repo with the benchmark can be found on GitHub

I'd expect Vec::from_iter to call Iterator::size_hint, which returns (0, Some(12)) in this case.

Actually, it does call size_hint, but it uses the lower bound. That seems odd to me.

1 Like

Did you try to run this benchmark with larger vectors?

For 1200 elements (just added two zeroes), the results are much better

     Running target/release/deps/my_benchmark-7c1f3cac9736b209
iter                    time:   [1.4242 us 1.4257 us 1.4272 us]                  
                        change: [-0.2425% -0.0512% +0.1491%] (p = 0.60 > 0.05)
                        No change in performance detected.
Found 8 outliers among 100 measurements (8.00%)
  1 (1.00%) low severe
  2 (2.00%) low mild
  4 (4.00%) high mild
  1 (1.00%) high severe

for                     time:   [1.3622 us 1.3633 us 1.3644 us]                 
                        change: [+11.248% +11.412% +11.587%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 4 outliers among 100 measurements (4.00%)
  1 (1.00%) low mild
  2 (2.00%) high mild
  1 (1.00%) high severe

The .iter() solution is now more or less on par with the for() loop. So I guess it's mostly the setup that's a lot more expensive with the iter() version. I feel a lot better now, although it still seems there can be a major performance drop if the iter() version with low iteration count is used in a tight loop.

1 Like

Also there is chapter in the Rust book about performance of iterators: https://doc.rust-lang.org/book/ch13-04-performance.html

Sadly I'm on mobile right now and can't test it myself:
What happens if you put the iterator in its own type and provide a custom size_hint (or even ExactSizeIterator)?
Or the other way around: replace the with_capacity with just a new. I would imagine the speed should be the same then.

1 Like

Changed Vec::with_capacity to Vec::new() (I didn't realize that filtering changes the size_hint() result, my bad).

n = 12:

iter                    time:   [120.44 ns 120.53 ns 120.62 ns]                 
for                     time:   [163.94 ns 164.05 ns 164.16 ns]                

n = 1200:

iter                    time:   [1.4734 us 1.4745 us 1.4757 us]                  
for                     time:   [1.9668 us 1.9685 us 1.9701 us]                 

iter() now wins for both cases, because for() is not cheating anymore with with_capacity(). Thank you, that was fun :slight_smile:

1 Like

and with this?

pub fn test_for() -> [usize; 6] {
    let n = 12;
    let mut index = 0;
    let mut result: [usize; 6] = [0; 6];
    for x in 0..n {
        if x % 2 == 0 {
            result[index] = x * x;
            index += 1;
        }
    }
    result
}

Your array version finishes in 1.4ns (1/100 the time). The assembly says it all:

example::test_array:
        mov     rax, rdi
        mov     qword ptr [rdi], 0
        mov     qword ptr [rdi + 8], 4
        mov     qword ptr [rdi + 16], 16
        mov     qword ptr [rdi + 24], 36
        mov     qword ptr [rdi + 32], 64
        mov     qword ptr [rdi + 40], 100
        ret
2 Likes

Oh i'm suprised that in this version the Iterator version is still way slower

pub fn test_iter_new() -> Vec<usize> {
    (0..1200)
	.filter(|x| x % 2 == 0)
	.map(|x| x * x)
	.collect()
}

pub fn test_iter_with_capacity() -> Vec<usize> {
    let n = 1200;
    (0..n)
        .filter(|x| x % 2 == 0)
        .map(|x| x * x)
        .fold(Vec::with_capacity(n), |mut result, x| {
            result.push(x);
            result
        })
}

pub fn test_for_with_capacity() -> Vec<usize> {
    let n = 1200;
    let mut result = Vec::with_capacity(n);
    for x in 0..n {
        if x % 2 == 0 {
            result.push(x * x);
        }
    }
    result
}
test benchs::iter_new           ... bench:       2,136 ns/iter (+/- 55)
test benchs::iter_with_capacity ... bench:       7,533 ns/iter (+/- 131)
test benchs::for_with_capacity  ... bench:       1,419 ns/iter (+/- 32)

rustc 1.46.0-nightly (346aec9b0 2020-07-11)

I once tried to make it smarter, but that turned out not to be a win (at least in terms of rustc speed):

Some other potentially-interesting conversations in

2 Likes

I notice that so far all the with_capacitys are overallocating by a factor of 2, which probably does not affect speed very much, but is wasteful for large n.

FWIW I would use extend rather than fold if you wanted to combine the approaches:

pub fn test_iter_with_extend() -> Vec<usize> {
    let n = 1200;
    let mut result = Vec::with_capacity((n + 1) / 2);
    result.extend((0..n).filter(|x| x % 2 == 0).map(|x| x * x));
    result
}

No idea if this is faster, it just seems more natural to me for gathering into a collection.

1 Like

oh forgot about extend it is as fast as test_for_with_capacity

test benchs::iter_with_extend   ... bench:       1,438 ns/iter (+/- 20)
1 Like

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.