How are bounds checks optimized away?

I was wondering how rust can optimize away bounds checks from iterators in for loops.

So I took Vec as an example.

fn main() {
    let v = get_vec();;
    for e in v {
         foo(e);
    }
}

fn get_vec() -> Vec<i32> { ... }
fn foo(x: i32) { ... }

Looking at <std::vec::IntoIter as Iterator>::next() I see that there clearly are bounds checks

    fn next(&mut self) -> Option<T> {
        if self.ptr == self.end {
            None
        } else if T::IS_ZST {
            // `ptr` has to stay where it is to remain aligned, so we reduce the length by 1 by
            // reducing the `end`.
            self.end = self.end.wrapping_byte_sub(1);

            // Make up a value of this ZST.
            Some(unsafe { mem::zeroed() })
        } else {
            let old = self.ptr;
            self.ptr = unsafe { self.ptr.add(1) };

            Some(unsafe { ptr::read(old) })
        }
    }

Well, I guess that makes sense. After all, we could just call next() as many times as we want. But what happens in the loop to optimize bounds checks away?

What is meant by "optimizing away bounds checks" is optimizing redundant bounds checks inside the loop body, when indexing. For example:

let items = [1, 2, 3, 4];
for i in 0..items.len() {
    let item = items[i]; // redundant bounds check
}

Of course, the "bounds check" used for detecting the end of the iterator is not redundant, and can't be "optimized away".*


*: of course, if the iterator has a size known at compile time, and the compiler notices it, and it's sufficiently small, then the loop can be unrolled completely, resulting in a series of identical sections of machine code with no test/branch in-between. But that is simply impossible for iterators of dynamic size.

2 Likes

I think you understood me wrong. I know that it can only optimize away redundant bounds checks.

I was wondering if anyone can describe the mechanism by which rust figures out that these bounds checks are redundant.

Is it in the way ExactSizeIterator in a for loop gets desugared or is it down at the llvm level that these bounds checks get removed?

Imagine I wanna write an Iterator such that rust can remove the redundant bounds checks. What would I have to look out for? I guess it has something to do with the ExactSizeIterator trait, but I couldn't figure it out.

EDIT
What do you mean by the check for end of iterator cannot be removed?

Does that mean that my vector code from above effectively does do bounds checks at every iteration to retrieve e?

Or do you just mean in the next() method?

That should be the case, in principle. Possibly, LLVM might be able to do some loop unrolling and turn the "one bounds check per interaction" into something like "one bounds check per N iterations" for some small constant N.

“Bounds check” might even be the wrong term here though, as it might be understood as just some safety precaution, whereas the check in question is very much functional, as it's the "is this for loop over yet" check[1] that you simply need if the iterations is ever supposed to end.


The optimization of redundant bounds checks that could happen to @H2CO3's code example with the 0..items.len() is something that would only happen at the LLVM level, as far as I'm aware. In the general case, the optimization there would be that instead of two bounds checks per iteration, one for checking whether the index has reached the end of the range, and one for checking whether it's in-bounds when doing the slice or array indexing, the code could be optimized to be just a single bounds check per iteration. This improvement is what using iterators instead of indexing gives you reliably for free, without relying on the LLVM optimizer. (And they also have the benefit of working with a pointer range instead of indices: IIRC, those might optimize even better further down the line, e. g. for the loop unrolling I mentioned above.)

Of course for a fixed size array like in that very code example, the whole loop over 4 elements could also likely be unrolled completely by LLVM and thus all bounds checks eliminated.


  1. there's probably a more technical term for this, too ↩︎

1 Like

Sorry, I'm an idiot😂 I forgot that even a simple c for loop has a comparison each iteration.

1 Like

The iterator only has one bounds check - for end of list - not two - end of list and "is this a valid item". This is implicit in the design of the next method.

let iterator = collection.into_iter();
for item in iterator {
    …
}

desugars to something semantically the same as:

let iterator = collection.into_iter();
while let Some(item) = iterator.next() {
    …
}

and contains one bounds check inside iterator.next() to make sure that it's not overrun the end of the collection you're iterating over.

In contrast, the equivalent of C's for(size_t index = 0; index < collection.len(); ++index) { … } looks similar to:

let loop_indexes = 0..collection.len();
for index in loop_indexes {
    let item = collection[index];
    …
}

and desugars to the equivalent of:

let loop_indexes = 0..collection.len();
while let Some(index) = loop_indexes.next() {
    let item = collection[index];
    …
}

This contains two bounds checks - the first is in loop_indexes.next(), where it checks to see if the next value it's going to return is in range, the second is in collection[index] where it checks to see if index is less than collection.len() and panics if iit's not.

Now, in simple variants of that second case, LLVM can spot that loop_indexes will never be larger than collection.len(), and will thus remove the second check, but this isn't guaranteed.

Those are one and the same thing? You call the next() method (perhaps implicitly, via for sugar) to retrieve the next element.

It is not possible to remove this check, as the code should somehow decide whether or not to go on to the next iteration.

Well, std iterators are usually already written in without any bounds checks to begin with. So even though using an iterator instead of array indexing is an "optimization" that is performed by the programmer (by knowing how to use an iterator to achieve the same outcome), it is strictly speaking not a compiler optimization, and nothing is being "removed" or "optimized away". It's just that usually iterators are written using unsafe and raw pointers, which are known to be in-bounds, instead of indexing that incurs an additional branch upon every iteration, besides the check already performed by next().

How the compiler can prove that indexing itself is in-bounds is a completely different question. It usually involves something called polyhedral optimization, which is concerned with using affine transformations over indices to reduce them to a simple, "obviously" in-bound loop such as for i in 0..length { array[i] }.

1 Like

On second thought.. there is some relevant optimization even with the iterator approach, because naively, it would do

  • check for whether the end of the list is reached, and based on that choice produce a Some or None value returned from the next method
  • then the while let checks whether there is a Some or a None value to determine the actual control flow

I'm not 100% certain, but I would assume that optimizing this into code that more directly uses the bounds check from within the next method in order to decide whether to keep looping might be just a LLVM thing as well.

On that note, in general, this seems to be a thing where using a for_each method instead of calls to next has the opportunity to rely on less compiler optimizations; though of course compiler optimizations are often useful and reliable so e. g. the for_each implementation for slice iterators uses nothing else but a while let with .next() calls either.

1 Like

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.