Range/Interate over values is faster than doing so by index?

I have this 2 blocks of code:

First one:

let start1 = Instant::now();
    for _ in 0..5000 {
        for i in 0..vals.len() {
            f1 += f64::powf(vals[i] as f64, 0.25)
        }
        f1 = f64::powf(f1, 0.25)
    }
    let duration1 = start1.elapsed();
    println!("Time elapsed in case 1 is: {:?}", duration1);

Second one:

    let start2 = Instant::now();
    for _ in 0..5000 {
        for v in vals2.clone() {
            f2 += f64::powf(v as f64, 0.25)
        }
        f2 = f64::powf(f2, 0.25)
    }
    let duration2 = start2.elapsed();
    println!("Time elapsed in case 2 is: {:?}", duration2);

There is not much complexity in what does this code do. The first one iterate over a vec by index, and the second one interate using value. However, I am very much intrigued by the execution time:

Time elapsed in case 1 is: 6.8724305s
Time elapsed in case 2 is: 2.2254849s

For some reason, the code in case 2 is 3 time faster in execution. Time may vary between computer, but the difference to this degree is certainly not coincident. In case 2, there is even a clone operation which did not take place in the first case.

So why ranging over a vec using its index is so inefficient compared to using value?
Thank you.

Are you compiling in release mode?

1 Like

In general, iterators tend to be faster because the bounds checking is a part of the "should I continue looping" check, whereas with indexing we are doing bounds checks on every access. The iterator form also tends to be easier to optimise.

Normally, the difference between iterators and indexing is so small it's negligible, but casting to a f64 and calling powf() is so trivial that those bounds checks start to add up[1].

For some concrete numbers and a reproducible example, I used this code (I had to modify your version because some variables were missing and to remove a clone that was only present when iterating) to get the following results when compiling in debug mode (no optimisations) and running on the playground:

Time elapsed in case 1 (indexing) is: 486.970715ms (21.32742721914226)
Time elapsed in case 2 (iterators) is: 380.353951ms (21.32742721914226)

However, when I switch to release mode, the optimiser is able to optimise both versions to pretty much the same machine code so our timings are essentially identical.

Time elapsed in case 1 (indexing) is: 85.951826ms (21.20522459912267)
Time elapsed in case 2 (iterators) is: 85.927285ms (21.20522459912267)

(note: I have to print f1 and f2, otherwise we get "elapsed" times of 30-80ns because the optimiser will see their result is never used and remove the loops entirely)


  1. ... At least in debug mode, as we'll see later. ↩ī¸Ž

2 Likes