Hare and tortoise indexes analysis

In LLVM I'd like a new pass able to understand the indices ranges in code like this:

pub fn compact_nonzeros(data: &mut [u32]) -> &[u32] {
    let mut pos = 0;
    for i in 0 .. data.len() {
        if data[i] != 0 {
            data[pos] = data[i];
            pos += 1;
        }
    }
    &data[.. pos]
}

This code seems currently beyond the ability of LLVM13 to infer at compile time that both indexing places (where pos is used) are safe.

(This code is just an example, there are tons of situations similar to this one where you can't convert the code to stdlib functions or other tricks).

2 Likes

To make this happen, you'd probably need to convert the code to equivalent C++ with vector.at(), and file an issue in the LLVM project. Maybe there is one already.

4 Likes

There will always be things that you can know are true but aren't provable by the compiler. In a case like this, where the relevant invariants are guaranteed by the code around it, I'd have no reservations about using unsafe and get_unchecked(_mut) to remove the spurious bounds checks. After all, that's what it's for: to tell the compiler "I know you can't prove it, but trust me."

In my code where I can't invent a sufficiently readable trick to help the compiler remove the bound tests, I keep them in most cases, because I like Rust memory safety. Eventually we'll have built-in ergonomic ways to prove Rust that some index is in bound, some value is nonzero, etc (a promising way seems this one to me: GitHub - google/wuffs: Wrangling Untrusted File Formats Safely ).

2 Likes

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.