Conversion using type punning

Hello,

I’m reading about type punning and undefined behavior, then fall into a situation where I still not sure is there any undefined behavior. Given the following:

#[repr(C)]
union Foo {
    b: Bar,
    v: u64,
}

#[repr(C)]
#[derive(Clone, Copy)]
struct Bar {
    lo: u32,
    hi: u32,
}

The idea is to profit the union Foo to convert two u32 values into a u64 value, as:

use std::mem::MaybeUninit;

fn convert(lo: u32, hi: u32) -> u64 {
    let mut foo = MaybeUninit::<Foo>::uninit();
    let ptr = foo.as_mut_ptr() as *mut Foo;
    unsafe {
        (*ptr).b.lo = lo;  // raw pointer dereference is unsafe,
        (*ptr).b.hi = hi;  // but is it a undefined behavior?
        (*ptr).v
    }
}

I suppose that though convert contains unsafe, but it should be always correct. Is this true?

Many thanks for any help.

First, it’s not clear to me why you’re using a pointer:

fn convert(lo: u32, hi: u32) -> u64 {
    let u = Foo { b: Bar { lo, hi }};
    unsafe { u.v }
}

Second: Judging from https://doc.rust-lang.org/reference/items/unions.html, this does not invoke undefined behavior:

Unions have no notion of an “active field”. Instead, every union access just interprets the storage at the type of the field used for the access. Reading a union field reads the bits of the union at the field’s type. It is the programmer’s responsibility to make sure that the data is valid at that type. Failing to do so results in undefined behavior.

Third: Even though it is not UB, it looks very suspicious because it is sensitive to machine endianness.

1 Like

@ExpHP, thank you.

Your solution is indeed must more clean.

it’s not clear to me why you’re using a pointer

Sorry, that’s because of a particular (and weird) reason, I don’t want to use Bar explicitly (the code of convert should not contain any initialization of Bar).

it is sensitive to machine endianness.

Yeah, you are correct, this code is indeed reserved for x86.

Are you using this to read a *const Bar (or &Bar, etc) as u64? In that case, you do have a problem, because *const Bar is only aligned to 4 bytes instead of 8.

1 Like

Sorry for my novice, could you elaborate a little bit this point?

Bar is aligned to 32-bit boundaries. Foo and u64 are aligned to 64-bit boundaries. So when you do:

let x: *const Bar = ...;
x as *const Foo

there is no guarantee that the resulting pointer is correctly aligned.

The documentation or reference do not state anywhere whether dereferencing a raw pointer is more similar to std::ptr::read or to std::ptr::read_unaligned. Based on this issue it sounds like it is more similar to std::ptr::read. Therefore, it requires an aligned pointer, or else it is UB.

Ah, I understand this point, but why it relates to convert:

fn convert(lo: u32, hi: u32) -> u64 {
    let mut foo = MaybeUninit::<Foo>::uninit();
    let ptr = foo.as_mut_ptr() as *mut Foo;
    unsafe {
        (*ptr).b.lo = lo;
        (*ptr).b.hi = hi;
        (*ptr).v
    }
}

Do you mean that ptr and &(*ptr).b could be aligned differently? and then the hi, lo initialization may put these values into different locations than low/high words of v?

Ah, okay, I misread this quote:

Sorry, that’s because of a particular (and weird) reason, I don’t want to use Bar explicitly (the code of convert should not contain any initialization of Bar ).

When I first read it I got the impression that you are not actually using this function, but instead doing something else. (which could have possibly involved a *const Bar)

Now I see you are referring to how the function as originally written does not ever mention Bar.

1 Like