How to avoid bounds checking?


#1

Checking today This Week in Rust I’ve stumbled on an issue that changes code to avoid bound checks.
I’ve also seen before another issue that I cannot find right now, that also avoided bound checks by doing something like this(from memory, not even 64.57% sure):

fn with_bounds(some_slice: &[usize]) {
    let len = some_slice.len();

    for it in 0..len {
        // calling some_slice[it] checks for bounds
    }
}

fn boundless(some_slice: &[usize]) {
    let temp_slice = &some_slice[0..];
    let len = temp_slice.len();
    
    for it in 0..len {
        // calling temp_slice[it] doesn't check for bounds
    }
}

fn main() {
    let some_vec = vec![10, 20, 30];
    with_bounds(&some_vec);
    boundless(&some_vec)
}

It changed the code that looks something like with_bounds into something that looks like boundless.
Is there a reliable way or clues of identifying the cases where we can help the compiler to detect when it can remove bound checks?


#2

I think the first hypothetical step is a compiler switch or something similar that makes the compiler list all the spots where it wasn’t able to remove bound checks.

But to create that switch you have to fetch the results of the optimizations done by the LLVM back-end. I think this can be done, but I guess it’s not very easy to do.

And different back-ends and different Rust compilers will have different capabilities in removing bound checks. So I think this whole work becomes a mess.

What you want is a way to be sure a certain array access is done without bound checks for every present and future Rust compiler and Rust compiler version, that uses any future back-end. And at the same time be sure that array access will be in-bounds. This sounds harder, but in practice it looks much more reasonable.


#3

I would expect the optimizer to elide all bounds checks in this particular example. This a trivial case: The loop condition is it < len. The bounds check condition is it < len, too, and so it can be optimized out. You can verify this by inspecting the optimized (release mode) code.

Still, in this situation, prefer iterators. Using for element in some_slice { is very easy. Use for (index, element) in some_slice.iter().enumerate() { if you need the index too.


#4

I cannot reproduce the first example having bounds checks, e.g.

#![crate_type = "lib"]
#![feature(test)]

extern crate test;

pub fn with_bounds(some_slice: &[usize]) {
    let len = some_slice.len();

    for it in 0..len {
        test::black_box(some_slice[it]);
    }
}

pub fn boundless(some_slice: &[usize]) {
    let some_slice = &some_slice[0..];
    let len = some_slice.len();
    
    for it in 0..len {
        test::black_box(some_slice[it]);
    }
}

produces exactly the same (optimised) asm for both functions: https://play.rust-lang.org/?gist=d89e229a78fc9f0940e5&version=nightly

However, in the past there has been some… LLVM difficulties with removing bounds checks on slices that were passed as function parameters, and simply explicitly moving them to let'd local variables improved the optimisations, i.e. fn boundless(some_slice: &[usize]) { let some_slice = some_slice; ... }. These may still exist.

That said, unless the loop is seriously tight (and the conventional some_slice.iter() doesn’t work) bounds checks are often unnoticable: they’re generally a comparison and a branch (i.e. two instructions), and the latter should be well predicted.


Is manually looping through a vector always strictly worse then using iterators?
#5

This already exists in the form of Iterator (for sequential access), and, more generally, via tricks such as @bluss’s indexing.


#6

Huh, finally found it…
I had issues remembering the actual code, but here I actually found it: https://github.com/rust-lang/rust/pull/30943/files#diff-91f9d2237c7851d61911b0ca64792a88L479

Pay attention that it previously was doing something like:

self = self[..self.len()];
let src = &src[..src.len()];
// where self.len() == src.len()

If you look up through the other code, you will see some changes like these:

// From:
out.clone_from_slice(buf);

// Into:
out[..buf.len()].clone_from_slice(buf);

Which I have the impression that was also done to avoid bound checks. Or just for consistency, to be sure that it is of the same size as the buffer length.


And here we see the fix, and actually the comment makes this particular case clear:


#7

Right, but it’s not general enough.

I think no one is using that… And I think the things I’m talking about need to be built-in. I’ll discuss this topic in another specific thread when I’ll know Rust better.


Learning Rust by writing a Z80 emulator (blog post series)
#8

Yes! The particular case of iterating two slices in lock step is a bit more complicated, I’ve described it in How to “zip” two slices efficiently.

It’s a trick that depends on some knowledge of what optimizes well and what doesn’t. Ideally this should not be needed in Rust (.zip() should handle this!) but we are not there yet. Libstd is typically the place where these kinds of implementation detail tricks are used to deliver the best code possible.

Edit: It looks like the first usage of this trick in rust was PR#17960