Worrisome unions

I recently discovered behaviour of a union that looks like an outright bug.
Take a look at this little snippet:

snippet
pub fn main() {
    union U {
        word: u64,
        byte: u8,
    }
    let mut v = U {word: u64::MAX};
    v = U {byte:0};
    // it passes in release,
    // but not in debug !!
    assert!(unsafe { v.word } == 0);
}

The first point of concern comes from the fact that the behaviour of this code changes based on whether it is compiled with optimisations. It shouldn't be, right?
You can see it here.

The second one is that it behave differently than c does.
You can see it here that where lowered c code just stores lower byte into same memory, lowered rust code for some reason writes byte with offset. Why rust does it?

Behaviour of C seems reasonable while rust's seems completely opposite.

This program has undefined behavior.

Your union is using the default representation. There are no guarantees of data layout in the default representation. The reference even says of unions that

Fields might have a non-zero offset (except when the C representation is used); in that case the bits starting at the offset of the fields are read. It is the programmer's responsibility to make sure that the data is valid at the field's type. Failing to do so results in undefined behavior.

Even if the layout were guaranteed, you overwrote v with U { byte: 0 }; therefore, all 7 bytes of v that are not the u8 value are now uninitialized (because they are padding of the U { byte: 0 } value, and the compiler is not obligated to only overwrite the byte). Reading that uninitialized data is undefined behavior.

Probably the reason that, in practice, the assert fails in debug and passes in release is because debug would copy all of the bytes of the U value whereas release mode would notice that only the byte field needs to be copied, and copy only one byte. But, that is merely an accident of the implementation. You cannot rely on it because your program has undefined behavior.

If you add #[repr(C)] then the layout is guaranteed to match a C union. But you still would have to write

v.byte = 0;

instead of v = U { byte: 0 }; so that you are modifying only the byte and not any of the space around it.


Additionally: it is good Rust practice to avoid writing unsafe code whenever possible, and in particular to use existing libraries with well-tested unsafe if any unsafe is needed. New union types are rarely needed; reinterpretation of bytes can often be done in better ways — for example, it can be completely safe using the bytemuck library.

19 Likes

Brilliant! This explains everything that caused me havoc. Thank you. :grin: