Differing valid value optimizations for single value and slice

Obviously, you should validate your values before you construct them. However, for the sake of this question lets say you have the following code.

use std::num::NonZeroU32;

pub fn validate_nzu32(x: NonZeroU32) {
    if x.get() == 0 {
        // Use abort to not clutter the asm with panic code.
        ::std::process::abort();
    }
}

pub fn validate_nzu32_slice(xs: &[NonZeroU32]) {
    for x in xs {
        validate_nzu32(*x);
    }
}
example::validate_nzu32:
        push    rax
        test    edi, edi
        je      .LBB0_1
        pop     rax
        ret
.LBB0_1:
        call    std::process::abort@PLT
        ud2

example::validate_nzu32_slice:
        ret

Code on rust.godbolt.org

Can anyone tell me why the slice case is optimized out (meaning it’s not doing what you think!) but the single value version isn’t? The fact that this can sometimes “work” is dangerous if someone experimentally establishes that you can do this and then continues to develop under this assumption.

Note: When doing the same thing for bools, the single value function is reduced to nothing like you would (or wouldn’t) expect.

1 Like

If you change to take a reference it also optimizes out. So question becomes why does it not do so with value type.

Regarding dangerousness, this is well within the domain of undefined behavior, where if you ever assume the optimizer will “do what you think”, you’re probably going to have a bad time. :slight_smile:

But if you’re just curious…

It’s because LLVM’s range metadata, which allows the frontend to specify that an instruction’s result must be in a certain range of integer values, is (for some reason) only supported on load and call instructions. The second function uses a load; the first does not. If you compile to LLVM IR rather than assembly, you can search for !range to see what gets emitted. And here is the code in rustc that actually emits that metadata.

2 Likes

If you look at --emit=llvm-ir without optimization, validate_nzu32 doesn’t have range information on the parameter, nor does NonZero::get on its parameter or return value. But in validate_nzu32_slice, the load from the slice does have range !4 = !{i32 1, i32 0}.

(That is, ditto @comex)

Some additional information in this bug about this situation:

1 Like

I really appreciate all the information here guys, thanks!

Even though it is obvious to me now, I did not consider that checks could be optimized out. A real world case where you have to be careful is when you are using FFI to initialize or update values in place and want to validate them after. For example:

use std::num::NonZeroU32;

extern "C" {
    pub fn maybe_write_u32(ptr: *mut u32);
}

pub struct Thing {
    x: NonZeroU32,
}

impl Thing {
    pub fn update(&mut self) {
        unsafe {
            maybe_write_u32(&mut self.x as *mut _ as *mut u32);
            let rx = &*(&self.x as *const _ as *const u32);
            if rx == &0 {
                ::std::process::abort();
            }
        }
    }
}

Code on rust.godbolt.org

Pretty gnarly. Unsafe is hard, can’t wait for guidelines.