Optimizing away bounds checks when zipping two iters of the same length

Consider Rust Playground

fn bla(a: Vec<i64>, b: Vec<i64>) -> i64 {
    let mut r = 0i64;
    assert!(a.len() == b.len());
    for (x,y) in a.iter().zip(b.iter()) {
        r += x+y;
    }
    r
}

Does rustc optimize away the bounds checks in the loops because it knows from the assert! that the two zipped vectors have the same length?

1 Like

I think it will help.

EDIT: Sorry, I was thinking about indexing. Iterators use pointer iteration internally, the assert wouldn't matter.

The stdlib has some optimizations to remove checks from each iteration over a zip even if you didn't have that assert

Suppose the compiler would not know that a and b are of the same length. Then it would need to bound check both the iterator over a and b (assuming a certain implementation of zip), right? Otherwise, just one check would be enough.

For Vec and a couple of other iterator types the stdlib has an optimization that allows it to pick the shortest length between a and b and iterate over elements from 0 until that value. It does so without bound checks since it already picked the shortest length. rust/library/core/src/iter/adapters/zip.rs at ce63e5d9ea20f15a70c6fdc4d4de7aee352fd965 · rust-lang/rust · GitHub

1 Like

Note that you're making the compiler's life more difficult here by writing this over Vecs, as well as making your function less general. Write it taking slices instead (since you don't use the ownership in here) and it becomes much more obvious that there's only the panic from the assert, no other checks: https://rust.godbolt.org/z/9bMc7oTxE.

My usual advice:

3 Likes

My original reason to ask this question was that my program became slower with

for (&a, &b, &c) in izip!(&A,&B,&C)

compared to the original

for (i,&a) in A.iter().enumerate() {
  let b = B[i];
  let c = C[i];

That's not nearly enough to say anything useful in response.

Optimization is non-local; what happens outside a snippit can affect how that snippit performs.

Provide a proper criterion benchmark showing the difference. Otherwise there's not enough for anyone to tell you anything useful.

Thanks. It would take too much time to create an example, so I accepted your answer above.

Hey, looks like we both love to think about low level optimizations and how exactly the compiler generates assembly.
Most if not all questions about generated machine code could be answered by the following ways - at least that's how I do it myself.

--

As already mentioned by scotmcm a few things you can answer yourself using godbolt.org
Here is a hotlink to a Rust helloworld snippet with compiler arguments to emulate cargo building for release and also my native CPU.
Adjust as needed.

Keep in mind that, as mentioned before, just looking at what the compiler generates out of context (just a snippet) might not be the same as what the compiler generates for the full project.
Depending on many factors (for example LTO enabled) the compiler will spit out vastly different ASM.

--

Another useful thing to check out could be cargo-show-asm.
I have not used it myself in real world scenario yet, but found it while searching for an answer of a problem in the same vein as yours.
My short test before sending this message was very promising - I think it is a very neat tool which you will value in the future.
From what I know there is/was another crate with a very similar name, which was abandoned - this one is maintained.

--

The only other way you can be 100% sure as to what is generated, is to compile it fully and then let it be analyzed by something like Ghidra.
Hope this is useful for your low level endeavors :slight_smile:

--

Edit: I realized the real question of your's was about the WHY it is optimized away - oops.
Will leave this message here since it is still useful and might come in handy to verify your assumptions yourself by modifying your code and retesting.