Iterators vs index loops performance

Consider the following two equivalent functions

fn func_index(vec: &mut Vec::<usize>) {
    for i in 0..vec.len() {
        vec[i] = i;
    }
}
fn func_iter(vec: &mut Vec::<usize>) {
    for (i, el) in vec.iter_mut().enumerate() {
        *el = i;
    }
}

I assumed that rustc is able to see the equivalence and produce the same assembly code for both (at least in release mode). Unfortunately this is not true:

test tests::bench_index ... bench:       4,773 ns/iter (+/- 128)
test tests::bench_iter  ... bench:       2,197 ns/iter (+/- 49)

#[bench]
fn bench_index(b: &mut Bencher) {
    let mut vec = vec![0; 10000];
    b.iter(|| func_index(&mut vec));
}
#[bench]
fn bench_iter(b: &mut Bencher) {
    let mut vec = vec![0; 10000];
    b.iter(|| func_iter(&mut vec));
}

Does anyone know why rustc is not able to generate the same assembly for both?

1 Like

The problem is that with the mutable pointer to a Vec LLVM assumes you could potentially modify the length while iterating and this prevents it from removing the bound check.

Of course this is not the case, however the specific optimization that rustc can use to tell LLVM it isn't actually the case is currently disabled because sometimes it causes LLVM to miscompile programs. You can enable it anyway with the flag -Zmutable-noalias=yes, with this it will generate the same assembly for both (proof).

Alternatively your function could take a &mut [usize]. This removes a layer of indirection, allowing LLVM to see that the slice reference itself is never modified and so it can be sure the length is never modified as well.

8 Likes

Thank you, that makes sense.

This came up recently in We all know `iter` is faster than `loop`, but why? if you're interested in more discussion about it.

I've filed a suggestion to make it a clippy lint:

2 Likes

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.