Based on what @RalfJunghas written about pointers and integer casts, it sounds like it's okay. It has to be, because it's considered sound to use memcpy to move pointers around, and what you're doing really isn't different from what memcpy does. (you're using OR to manipulate individual bits, but memcpy uses SIMD intrinsics to move around multiple pointers at once in a single byte vector, so both of them need to track provenance in a way that can track individual pieces of the pointer)
Add #[repr(C)] or #[repr(transparent)] to Data to ensure it's represented the same as an i32. Then use a match like @notriddle suggested to create the Option. It might be okay to transmute, but why risk it when you can do a clearly okay pointer cast?
Could you please explain why do I need #[trasparent] in this case? I'm confused because I'm not creating a reference to i32 but a ref to Data itself, I'm not making any assumptions about representation of Data (except LSB is zero but this is pretty sound AFAICS)
Yes, this is how it's done for now. But it doesn't get optimized in debug mode and this is quite a heavy used part of the code. Profiler tells me transmute could speed it up to 35% and I'm quite concerned about tests' run time
This is an excellent point and the the reason why I'm asking in the first place. But please note: I'm not transmuting between Data and Option, I'm transmuting between usize and Option<&T>. Both usize and &T are repr(C) but I'm not sure about Option so here I am.
Your reference was originally created from a Box<i32>, so the reference points to an i32. Turning such a reference into a &Data requires that they have equivalent layout.
Option-like enums where the payload defines at least one niche value are guaranteed to be represented using the same memory layout as their payload. ... niche values are used to represent the unit variant.
The niche of a type determines invalid bit-patterns that will be used by layout optimizations.
For example, &mut T has at least one niche, the "all zeros" bit-pattern. This niche is used by layout optimizations like " enum discriminant elision" to guarantee that Option<&mut T> has the same size as &mut T .
&T has "all zeros" niche value too, hence I'm deriving from here: Option<&T> is guaranteed to have the same layout as &T when Some and "all zeros" pattern when None. This is the same layout as for *const T, so as long as usize contains a valid address the transmute is valid.
You're right, TaggedPtr doesn't come into play. Option isn't repr anything but @pcpthm could be right and Option<&T> is a special case where no UB occurs? I don't know.
But layout isn't enough, this code doesn't trigger any warning from the compiler nor Miri, but is UB:
struct TwoInt(u32, u32);
fn main() {
let x = TwoInt(0, 1);
let y: u64 = unsafe {std::mem::transmute(x)};
}
It seems to me that it should be defined behaviour based on the link @pcpthm posted:
Option-like enums where the payload defines at least one niche value are guaranteed to be represented using the same memory layout as their payload. This is called discriminant elision , as there is no explicit discriminant value stored anywhere. Instead, niche values are used to represent the unit variant.
The most common example is that Option<&u8> can be represented as an nullable &u8 reference -- the None variant is then represented using the niche value zero. This is because a valid &u8 value can never be zero, so if we see a zero value, we know that this must be None variant.
Example. The type Option<&u32> will be represented at runtime as a nullable pointer. FFI interop often depends on this property.
Example. As fn types are non-nullable, the type Option<extern "C" fn()> will be represented at runtime as a nullable function pointer (which is therefore equivalent to a C function pointer) . FFI interop often depends on this property.
Well, if my understanding is correct, the code you provided is not UB provided that TwoInt had size 8 and align 4 (but Rust compiler can choose other layouts).
Thus,
struct TwoInt(u32, u32);
fn main() {
let x = TwoInt(0, 1);
assert!(size_of::<TwoInt>() == 8 && align_of::<TwoInt>() == 4);
let y: u64 = unsafe {std::mem::transmute(x)};
}
Yes, the value of y can be two different values depending on the layout (also endianness) with my assumption. But it is only an unspecified behavior (the behavior is only in a defined set, in this case, of size two), not an undefined behavior UB (the behavior can be anything). (I'm 80% sure about this).