Exploring different implementations of the same functionality in optimized assembly

Zero cost abstractions are cool and I wanted to strengthen my faith in them, so I started using the excellent cargo-asm to generate human readable asm for some code I was working on.

The code uses an external function call to initialize an array of u32s, ensures none of them are 0 and returns the values as NonZeroU32s. This involves some unsafe code and my goal was to make the rust code use as little unsafe tricks as possible while keeping the emitted asm as close to, what I perceive as, optimal. Disclaimer: I am very much a beginner in terms of understanding compiler optimizations and assembly so I will mostly observe and try to limit any judging.

First attempt

Supporting code
use std::num::NonZeroU32;

// Code from https://github.com/nvzqz/static-assertions-rs.
macro_rules! assert_eq_size {
    ($x:ty, $($xs:ty),+ $(,)*) => {
        $(let _ = ::std::mem::transmute::<$x, $xs>;)+
    };
    ($label:ident; $($xs:tt)+) => {
        #[allow(dead_code, non_snake_case)]
        fn $label() { assert_eq_size!($($xs)+); }
    };
}

mod ffi {
    extern "C" {
        pub fn overwrite_values(len: i32, values: *mut u32);
    }
}

#[inline]
fn overwrite_values(values: &mut [Option<NonZeroU32>]) {
    assert_eq_size!(Option<NonZeroU32>, u32);

    unsafe {
        ffi::overwrite_values(values.len() as i32, values.as_mut_ptr() as *mut u32);
    }
}

The function of interest in rust:

pub unsafe fn create_values_transmute() -> [NonZeroU32; 2] {
    // Reserve some unallocated memory on the stack.
    let mut values: [Option<NonZeroU32>; 2] = ::std::mem::uninitialized();

    // Overwrite the values with unknown new values.
    overwrite_values(&mut values);

    // Ensure all values are Some.
    for value in values.iter() {
        assert!(value.is_some());
    }

    // Strip away the Option without copying.
    ::std::mem::transmute::<[Option<NonZeroU32>; 2], [NonZeroU32; 2]>(values)
}

The function of interest in assembly:

experiments_rust::create_values_transmute:
  push    rax
  mov     rsi, rsp
  mov     edi, 2
  call    overwrite_values
  cmp     dword, ptr, [rsp], 0
  je      .LBB14_2
  mov     eax, dword, ptr, [rsp, +, 4]
  test    eax, eax
  je      .LBB14_2
  mov     rax, qword, ptr, [rsp]
  pop     rcx
  ret
.LBB14_2:
  lea     rdi, [rip, +, .Lanon.999ddb7825e3783297e13721a9ed219b.22]
  lea     rdx, [rip, +, .Lanon.999ddb7825e3783297e13721a9ed219b.21]
  mov     esi, 33
  call    std::panicking::begin_panic
  ud2

Code on rust.godbolt.org

Now, something odd is happening here. The Some checking loop is unrolled, but the first value is compared with one instruction, not using a register

cmp     dword, ptr, [rsp], 0

but the second value is compared with two instructions

mov     eax, dword, ptr, [rsp, +, 4]
test    eax, eax

It looks like the returned value is taken from the stack and placed in the rax register (mov rax, qword, ptr, [rsp]). With a larger array size this will most likely no longer work.

Manually unrolling the loop

What happens if we manually unroll the loop?

// Ensure all values are Some.
values[0].as_ref().unwrap();
values[1].as_ref().unwrap();
full rust
pub unsafe fn create_values_transmute_unrolled() -> [NonZeroU32; 2] {
    // Reserve some unallocated memory on the stack.
    let mut values: [Option<NonZeroU32>; 2] = ::std::mem::uninitialized();

    // Overwrite the values with unknown new values.
    overwrite_values(&mut values);

    // Ensure all values are Some.
    values[0].as_ref().unwrap();
    values[1].as_ref().unwrap();

    // Strip away the Option without copying.
    ::std::mem::transmute::<[Option<NonZeroU32>; 2], [NonZeroU32; 2]>(values)
}

The assembly:

cmp     dword, ptr, [rsp], 0
je      .LBB14_3
cmp     dword, ptr, [rsp, +, 4], 0
je      .LBB14_3
full asm
experiments_rust::thingy::create_values_transmute_unrolled:
  push    rax
  mov     rsi, rsp
  mov     edi, 2
  call    overwrite_values
  cmp     dword, ptr, [rsp], 0
  je      .LBB14_3
  cmp     dword, ptr, [rsp, +, 4], 0
  je      .LBB14_3
  mov     rax, qword, ptr, [rsp]
  pop     rcx
  ret
.LBB14_3:
  lea     rdi, [rip, +, .Lanon.2412bd351f75f47ca0d5104005dbeb29.2]
  call    core::panicking::panic
  ud2

Code on rust.godbolt.org

This time, the single instruction comparison with 0 is performed for the second value as well.

Getting rid of transmute

Because ::std::mem::transmute is so powerful, it is easy to mess something up. We can try to get rid of it by reconstructing the array from the individual values.

// Unwrap the options, hopefully it gets optimized and happens in place.
[
    values[0].take().unwrap(),
    values[1].take().unwrap(),
]
full rust
pub unsafe fn create_values_take() -> [NonZeroU32; 2] {
    // Reserve some unallocated memory on the stack.
    let mut values: [Option<NonZeroU32>; 2] = ::std::mem::uninitialized();

    // Overwrite the values with unknown new values.
    overwrite_values(&mut values);

    // Unwrap the options, hopefully it gets optimized and happens in place.
    [
        values[0].take().unwrap(),
        values[1].take().unwrap(),
    ]
}

The assembly:

mov     ecx, dword, ptr, [rsp]
mov     dword, ptr, [rsp], 0
test    rcx, rcx
je      .LBB14_3
mov     eax, dword, ptr, [rsp, +, 4]
mov     dword, ptr, [rsp, +, 4], 0
test    rax, rax
je      .LBB14_3
full asm
experiments_rust::thingy::create_values_take:
  push    rax
  mov     rsi, rsp
  mov     edi, 2
  call    overwrite_values
  mov     ecx, dword, ptr, [rsp]
  mov     dword, ptr, [rsp], 0
  test    rcx, rcx
  je      .LBB14_3
  mov     eax, dword, ptr, [rsp, +, 4]
  mov     dword, ptr, [rsp, +, 4], 0
  test    rax, rax
  je      .LBB14_3
  shl     rax, 32
  or      rax, rcx
  pop     rcx
  ret
.LBB14_3:
  lea     rdi, [rip, +, .Lanon.19f1b27cb02834246aaa2ebed86c020d.2]
  call    core::panicking::panic
  ud2

Code on rust.godbolt.org

Unfortunately, using take means a None gets written to the stack. It does not look like the values on the stack will be accessed in the non-panic case, since the return value is constructed from the registers ecx and eax. In case of a panic though, the compiler is unable to prove that the stack is not accessed which means that it has to set the values to 0. (Please correct me if I’m wrong about any of this!)

Getting rid of take

Instead of using Option::take we can use std::mem::replace in conjunction with std::mem::uninitialized to try and get the compiler to not write a 0 back to the stack.

// Unwrap the options, hopefully it gets optimized and happens in place.
[
    ::std::mem::replace(&mut values[0], ::std::mem::uninitialized()).unwrap(),
    ::std::mem::replace(&mut values[1], ::std::mem::uninitialized()).unwrap(),
]
full rust
pub unsafe fn create_values_replace() -> [NonZeroU32; 2] {
    // Reserve some unallocated memory on the stack.
    let mut values: [Option<NonZeroU32>; 2] = ::std::mem::uninitialized();

    // Overwrite the values with unknown new values.
    overwrite_values(&mut values);

    // Unwrap the options, hopefully it gets optimized and happens in place.
    [
        ::std::mem::replace(&mut values[0], ::std::mem::uninitialized()).unwrap(),
        ::std::mem::replace(&mut values[1], ::std::mem::uninitialized()).unwrap(),
    ]
}

The assembly:

experiments_rust::thingy::create_values_replace:
  jmp     _ZN16experiments_rust6thingy18create_values_take17h47d8bdb4a7c32c3cE@PLT

Code on rust.godbolt.org

Wait, what?! It simply calls the take version! What if we compile without create_values_take?

mov     ecx, dword, ptr, [rsp]
test    rcx, rcx
je      .LBB14_3
mov     eax, dword, ptr, [rsp, +, 4]
test    rax, rax
je      .LBB14_3
full asm
experiments_rust::thingy::create_values_replace:
  push    rax
  mov     rsi, rsp
  mov     edi, 2
  call    overwrite_values
  mov     ecx, dword, ptr, [rsp]
  test    rcx, rcx
  je      .LBB14_3
  mov     eax, dword, ptr, [rsp, +, 4]
  test    rax, rax
  je      .LBB14_3
  shl     rax, 32
  or      rax, rcx
  pop     rcx
  ret
.LBB14_3:
  lea     rdi, [rip, +, .Lanon.c043895d4cb9cfccd6c56ffc4fe98f75.2]
  call    core::panicking::panic
  ud2

Code on rust.godbolt.org

Woah it worked and woah that is scary! The presence of the take version affects whether or not the replace version will do what you think it will!

Conclusion

All code together: rust.godbolt.org.

  1. The register comparison with mov, test and the stack comparison with cmp provide equivalent functionality. We even get a mixed result with the create_values_transmute version. Why isn’t one preferred over the other? Isn’t it strange the the manual unrolling of the 2 item loop yields a different result?

  2. I’m surprised the compiler detects that create_values_replace can simply call create_values_take. It adds two mov instructions that are not necessary to reduce code size. It is a good reminder ::std::mem::uninitialized() means the compiler is free to write a value there, even if you are not.

  3. Some other time I will investigate what happens when the return value can’t be fit in registers.

1 Like

Are you attempting to detect that the FFI actually wrote to them? Because right the code is unsound (matches on uninitialized) if overwrite_values doesn’t set them at all.

Also, a small request: can you include a godbolt or playground link? It’d make it way easier to play along at home…

Yes, I am assuming ffi::overwrite_values overwrites all of the values in this case. In a real life scenario it is definitely worth considering not having this assumption and initializing the buffer to all Nones.

A fantastic idea to add godbolt links. Will do as soon as I get to my pc.

EDIT: Added godbolt links.

The following looks UB as well:

fn overwrite_values(values: &mut [Option<NonZeroU32>]) {
    assert_eq_size!(Option, u32);
    unsafe {
        ffi::overwrite_values(values.len() as i32, values.as_mut_ptr() as *mut u32);
    }
}

// Reserve some unallocated memory on the stack.
let mut values: [Option<NonZeroU32>; 2] = ::std::mem::uninitialized();

// Overwrite the values with unknown new values.
overwrite_values(&mut values);

overwrite_values takes a &mut, and forming a mutable reference to that is immediate UB, AFAIK. So I think before @scottmcm’s concern comes into play (or maybe he was referring to this aspect as well), the “setup” code itself hits UB.

I am not sure what exactly you mean. My best guess is you are saying that in general for some type T

let mut value: T = ::std::mem::uninitialized();
let r = &mut value; // <-- UB?

because for example, *r = whatever drops uninitialized T when written to. Another problem is if T only allows certain valid bit patterns, rust’s assumptions no longer hold.

But then, looking at std::io::BufReader::with_capacity for example we see:

let mut buffer = Vec::with_capacity(cap);
buffer.set_len(cap);
inner.initializer().initialize(&mut buffer);

where Vec::set_len introduces uninitialized memory and also passes the memory as a slice to the initializer.

Again I’m not sure if I understood exactly what you were calling UB, can you elaborate? I definitely want to broaden my understanding of these things.

Yes, specifically due to use of mem::uninitialized.

My understanding is getting a mutable reference to anything that’s std::mem::uninitialized is “instant” UB. The only caveat is if you’re using the mutable reference to immediately turn it into a raw ptr:

let mut x: SomeThing = std::mem::uninitialized();
let raw = &mut x as *mut _;

It’s a bit odd, but I think the above is fine; there’s no way to actually get a raw pointer without first going through a reference in cases like this. Although I’m not sure where cases like this will eventually settle, given the ongoing effort to formalize and refine the unsafe guidelines.

There was some discussion in a thread somewhat recently, probably key post being this one.

My understanding is getting a mutable reference to anything that’s std::mem::uninitialized is “instant” UB.

I believe the exact UB conditions are still unspecified. Passing &mut T pointing to uninitialized memory to a function happens all the time in the std. Another example:

    let mut buf = unsafe {
        let mut buf: [u8; super::DEFAULT_BUF_SIZE] = mem::uninitialized();
        reader.initializer().initialize(&mut buf);
        buf
    };

You generally can’t do anything with uninitialized memory unless you form a mutable reference to it, so forbidding that would be super inconvenient.

Right, I’m not sure what exactly is the current stance. There’s an issue to deprecate uninitialized, but I can’t quite tell whether code like the above is already UB or is technically UB but isn’t treated as such by the current compiler, will never be UB, or something else entirely.

The thread I linked to in my previous reply certainly makes it look like doing anything with uninitialized is fraught with peril, but clearly we have existing code today in the ecosystem that “works”. In particular, I was quite surprised that even ptr::read would trigger UB - I always thought touching uninit via “raw” access wasn’t subject to these rules, but apparently that was incorrect.

Under https://www.ralfj.de/blog/2018/08/22/two-kinds-of-invariants.html, it's definitely unsafe, but I think the jury is still out on whether it's invalid. I'm pretty sure it's not, however, since it's done all over the place, including in ptr::swap.

That said, I completely agree that using [MaybeUninit<u32>; 2] (or similar) would be better -- once we get https://github.com/rust-lang/rust/pull/53508.

So I’ve been reading about ::std::mem::uninitialized, its problems with uninhabited types, MaybeUninit and about undef, poison and freeze (https://www.cs.utah.edu/~regehr/papers/undef-pldi17.pdf) in LLVM.

Still, I find it very hard to pin down what the dangers are of working with uninitialized memory and how to take care of all aspects. I am going to make an attempt at describing the problems of uninitialized memory here.

My desire as a programmer is to

  1. reserve a piece of memory,
  2. optionally write to the memory, and
  3. read from the memory.

A first issue is that of undef. As expected, the compiler is allowed to leave any value in the uninitialized memory. What I, and probably many others, did not expect is that the compiler is allowed to change the values in between reads! This makes sense if you think about it. However it would be nice to indicate that after a certain point, the value should be fixed, even if you don’t know what it is. In fact, I think that we want the values to be fixed when we read and immediately after we write. This also means that taking a (mutable) reference should also fix the value right there and then, since we can pass those around and do reads and writes using them.

Unfortunately I don’t know how to instruct the compiler to fixate an uninitialized value. Is there any way to?

Secondly there is the issue of uninhabited types. I haven’t read enough on this subject to really understand whats going on but basically, trying to create a value of the type ! shouldn’t be possible and will lead to undefined behavior immediately if you do.

Of course I’m havent touched the dangers of “creating” objects out of thin air using ptr::read, mem::transmute, mem::uninitialized and the like. Regardless I probably already missed certain aspects of the perils of uninitialized memory.

It feels like you really just have two options:

  • Initialize the memory before calling whatever and check the result (all safe)
  • Don’t initialize and don’t check, trusting the unsafe code to do the right thing because unsafe

(Or perhaps the former in debug and the latter in release)

For anyone following the thread who wants to read about this: https://llvm.org/docs/LangRef.html#undefined-values

Thank you for that. Fortunately, apart from a probably not so good way of gathering entropy I don’t see a use case for reading uninitialized memory intentionally.

I’m starting to feel like

This is incredibly dangerous and should not be done lightly. …
https://doc.rust-lang.org/std/mem/fn.uninitialized.html

is an understatement :sweat_smile:.

I have some more questions

  1. If you pass some uninitialize memory through FFI and the FFI code does not in fact write to the memory. Can you end up with an undef? If yes, that means you have to trust the foreign code to actually write, right?

  2. Is the only thing, that turns an undef it into a normal value, overwriting it?

  1. If I have uninitialized memory, can I take a reference to it if I guarantee that I will write to the memory before reading? I don’t understand why taking a reference is instant UB.

Just trying to follow along, but I would guess that if uninitialized memory is undef and can change at any time, and if &mut guarantees the referent will not change except via writes from the borrower, then an &mut to undef would clearly be undefined behavior… the optimizer could end up storing two different values in registers/memory (via undef) while operating under the assumption that they are the same value (via &mut).

EDIT: On second thought, mem::uninitialized seems to contradict this, so what do I know. :slight_smile:

Yes. You probably won’t see an effect of that on x86, but IIRC there is some hardware that tracks uninitialized itself, and will fault on the read if the other code never initialized it.

Yes, as far as I know.

Intel Itanium, and all the HP, IBM, Dell, Sun, and maybe other computer systems based on that processor.

Itanium is on its last breath, but curious - my recollection is Itanium has an extra bit in the register to tag the value as being “unset”. IIRC, some loads can set this bit under some conditions; for example, if the load accesses an unmapped page, the reg storing the address of whatever memory was addressed has the bit flipped.

Now, how would this work with stack area, such as the case when you’d std::mem::uninitialized some value that you then pass a ptr to FFI to fill out? The stack location is almost certainly going to be a valid page. How would Itanium detect this particular case today?

@mickvangelderen this is awesome, thanks for posting this. I’m gonna take a deeper look tomorrow but just having a cursory look, very interesting! Coming from C and asm, very familiar with godbolt and will be fun to use cargo-asm and examine some Rust-generated binaries as well when I get around to it.

1 Like