Why can't rustc optimize out bounds checks in this code?

https://rust.godbolt.org/z/Ka59jrbxM

pub fn foo(a: &mut [i32], n: usize) {
    if n > a.len() {
        return;
    }
    let mut i = 1;
    while i < n {
        a[i] += 1;
        i += 1;
    }
}
.LBB0_5:
        cmp     r9, rcx
        je      .LBB0_8
        inc     dword ptr [rdi + 4*rcx]
        inc     rcx
        cmp     rdx, rcx
        jne     .LBB0_5
.LBB0_7:
        pop     rax
        ret
.LBB0_8:
        lea     rdx, [rip + .L__unnamed_1]
        mov     rdi, r9
        call    qword ptr [rip + core::panicking::panic_bounds_check@GOTPCREL]
        ud2

This is a classic place where re-slicing will do what you want: https://rust.godbolt.org/z/4KGMMaxsq

pub fn foo(a: &mut [i32], n: usize) {
    if n > a.len() {
        return;
    }
    let a = &mut a[..n]; // <--
    let mut i = 1;
    while i < n {
        a[i] += 1;
        i += 1;
    }
}

Or, better, you can use the function to do the conditional behaviour better than the manual code, which also works: https://rust.godbolt.org/z/W44cET7rW

pub fn foo(a: &mut [i32], n: usize) {
    let Some(a) = a.get_mut(..n) else { return };
    let mut i = 1;
    while i < n {
        a[i] += 1;
        i += 1;
    }
}

Though of course better still is to just stop indexing manually...

pub fn foo(a: &mut [i32], n: usize) {
    a.get_mut(..n).map(|a| a.iter_mut().skip(1).for_each(|x| *x += 1));
}

https://rust.godbolt.org/z/1ff6E6Pjb

EDIT: fixed the last example. Thanks, @steffahn !

13 Likes

This should probably use get_mut(1..n) in order to be equivalent?

2 Likes

Oops, I changed the approach and forgot to put stuff back. Thanks for catching it!

I used skip(1) instead of 1..n in an attempt to keep the code more similar in steps.

Hard to say what the right way to do it would be without a real example. After all, passing an n like that is a bit odd -- it's unclear why the caller didn't do the slicing instead of making it another parameter.

2 Likes

It's good to know this technique, but why doesn't the compiler eliminate the bounds check in the original post?

Because the compiler isn't (for whatever reason) tracking the fact that if n <= a.len() and i < n, then i < a.len().

Compiler Explorer has an example where this is fixed up; because we have an explicit check that i < a.len(), the bounds check becomes trivially true, and is removed. However, the compiler checks both i < a.len() and i < n, which tells me that the compiler isn't spotting that the if bounds n's value to n <= a.len() and using that information.

I can see that if I swap round to Compiler Explorer then the compiler still isn't able to spot that i < n implies i < a.len(), so that's the missing bit of information that the compiler needs to optimize.

Of course, in this case, iterators or re-slicing also work just fine, and iterators are almost always the better answer.

3 Likes

Interestingly, gcc can optimize this c++ equivalent while clang can't.

1 Like

My hypothesis: because bounds elimination works by looking for canonical forms, so you have to actually use a canonical form.

Roughly, the bounds check is against the length of the slice. So if you're looping to the length of the slice, then LLVM sees the loop induction variable, sees that it's the same bound as the loop predicate, and can easily remove it. If it's anything else, you're at the mercy of what it happens to have implemented.

Compilers aren't magic, and people aren't willing for the compiler to run a full SMT solver on every condition to see if it can be removed, because that'd be horrifically slow. So your job as a programmer is to write obvious, ordinary code so that it's easy for the optimizer -- and humans, too! -- to see what's happening. Whenever possible, loop over the whole slice. Don't take a slice and something else, have the caller slice it for you. (They can call foo(&a[..n]) just fine; they don't need foo(a, n).)

But as always, if you find something you think that LLVM should be optimizing but isn't, open an LLVM issue :slight_smile:

(Especially if you can repro it in clang. For example, if the same thing using std::vector::at in C++ doesn't optimize away the bounds checks, then it's an LLVM thing, not a Rust thing.)

3 Likes

LLVM has a constraint elimination pass that is supposed to (and can in some cases) recognize that less-than is transitive. I don't think this optimization is in the "magical" realm. Whether you should write code like this is a separate question (and I agree that you shouldn't). Code that highlights a compiler oddity usually isn't "good"!

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.