How are enum discriminants sized in rust?

So I'm finding an interesting incompatibility in the code generation/optimization that's being done between 32-bit vs 64-bit, specifically relating to how the rust compiler views the size of the enum discriminant, i.e. let's say an Option enum where the tag can theoretically fit in a byte tag value. Consider the following code (compiled as --release)

use std::sync::Mutex;

const SIZE: usize = 256;

#[repr(C, align(8))]
struct Foobar {
    d: [u8; SIZE],
    a: u64,
    b: u64,
    c: u32,
}

#[inline(never)]
fn get_int(x: usize) -> usize {
    let mut ret = x;
    unsafe {
        core::arch::asm!("add {}, 4" , inout(reg) ret)
    }
    ret
}

#[inline(never)]
fn vvv(mybar: &Option<Foobar>) {
    println!("HELLO {}", mybar.as_ref().unwrap().a);
    println!("HELLO {}", mybar.as_ref().unwrap().b);
    println!("HELLO {}", mybar.as_ref().unwrap().c);
}

fn consume(data: &[u8]) {
    for b in data {
        print!("{b:02x}");
    }
    println!("");
}

static HELLO: Mutex<Option<Foobar>> = Mutex::new(None);
static BYE: Mutex<Option<Foobar>> = Mutex::new(None);

pub fn main() {
    let mut a = Foobar {
        a: get_int(1) as u64,
        b: get_int(2) as u64,
        c: get_int(3) as u32,
        d: [0; SIZE],
    };

    for i in 0..get_int(23) {
        a.d[i as usize % SIZE] += i as u8;
    }

    {
        const SIZE2: usize = size_of::<Option<Foobar>>();
        let mut data = [0xcc as u8; SIZE2];
        for i in 0..get_int(26) {
            data[i % SIZE2] = data[i % SIZE2].wrapping_add(i as u8);
        }
        consume(&data);
    }

    unsafe { core::arch::asm!("") }

    *HELLO.lock().unwrap() = Some(a);

    unsafe { core::arch::asm!("") }

    let tag = unsafe {
        let tmp = HELLO.lock().unwrap();
        *(&*tmp as *const _ as *const u32)
    };
    println!("The tag of HELLO: 0x{tag:08x}");

    #[allow(invalid_reference_casting)]
    let tag = unsafe {
        let tmp = HELLO.lock().unwrap();
        *(&*tmp as *const _ as *const u32 as *mut u32) ^= 0xff00;
        *(&*tmp as *const _ as *const u32)
    };
    println!("The tag of HELLO (after mutating): 0x{tag:08x}");

    unsafe { core::arch::asm!("") }
    println!("Attempting to dereference HELLO");
    vvv(&*HELLO.lock().unwrap());

    unsafe { core::arch::asm!("") }
    println!("Attempting to dereference HELLO in second way");
    let ptr = &HELLO as *const _ as usize;
    let ptr2 = unsafe {
        &*(ptr as *const Mutex<Option<Foobar>>)
    };
    let mut hello_tmp = ptr2.lock().unwrap();
    let x = hello_tmp.as_mut().unwrap();
    println!("HELLO {}", x.a);

    println!("Attempting to dereference BYE (which should panic)");
    vvv(&*BYE.lock().unwrap());
}

On x86_64-unknown-linux-gnu it will crash when dereferencing HELLO, because it seems like in that particular case, the code generation would read the Option enum tag as a u32:

cccdcecfd0d1d2d3d4d5d6d7d8d9dadbdcdddedfe0e1e2e3e4e5e6e7e8e9cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc
The tag of HELLO: 0x00000001
The tag of HELLO (after mutating): 0x0000ff01
Attempting to dereference HELLO
HELLO 5
HELLO 6
HELLO 7
Attempting to dereference HELLO in second way

thread 'main' panicked at undefined_behavior/src/main.rs:93:32:
called `Option::unwrap()` on a `None` value
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

Interestingly, the same exact code on i686-unknown-linux-gnu views the enum tag as a u8 and would potentially have uninitialized data in the tag location. It also would crash only when it tries to dereference BYE.

cccdcecfd0d1d2d3d4d5d6d7d8d9dadbdcdddedfe0e1e2e3e4e5e6e7e8e9cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc
The tag of HELLO: 0xcecdcc01
The tag of HELLO (after mutating): 0xcecd3301
Attempting to dereference HELLO
HELLO 5
HELLO 6
HELLO 7
Attempting to dereference HELLO in second way
HELLO 5
Attempting to dereference BYE (which should panic)

thread 'main' panicked at undefined_behavior/src/main.rs:24:41:
called `Option::unwrap()` on a `None` value
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

Is there any reason why between 32-bit and 64-bit there is this discrepancy? Also it is worth noting that there is also certain cases in the 64-bit mode where rust will also only check a byte comparison as opposed to a full u32 check.

Enum layout for repr(Rust) enums is not stable between Rust versions or platforms. Compare e.g. this (which says essentially nothing), and the only additional guarantees that I’m aware of, off the top of my head, are the ones for Option specifically documented here.

Nonetheless, it can be interesting to discuss what’s going on, e.g. for curiosity about performance of code, or to explain why datatypes get the size that they get, or with interest in compiler design, etc…

I have a hard time following how exactly the example code you provides connects to the goal of inspecting layout or codegen from the compiler; especially since your code example contains undefined behavior in multiple places (as far as I can tell). So I’m not convinced yet that you aren’t simply observing differences in how the compiler handles code with UB, which isn’t particularly interesting IMHO, since we aren’t supposed to feed it code with UB in the first place.

As for the layout that happens to be chosen here, for all I can tell by using the “rustc_layout” compiler-debugging attribute and counter to your conclusion of it only being a u32, on Rust 1.87 as well as the 7 days old nightly, on the (presumably 64 bit linux(?)) environment that play.rust-lang.org runs in, for your Option<Foobar> example, the enum variant is actually a full 8 bytes u64/i64 value.

3 Likes

Ahh yes so between 64-bit and 32-bit the rustc_layout of the tag actually differs as a u64 vs u8 (which explains the first UB part). Do we know why that is the case

If you used repr(u32), it'll be 4 bytes. (And similarly for the other types you can put there.)

If you didn't pick exactly the type yourself, then the compiler is free to use whatever size it wants, and can change between compiler versions (or even builds, arguably).

layout for #[repr(Rust)] enums are up to the compiler. there may even be no separate discriminmant "field" at all, e.g. when a variant contains fields of some of the numeric types with niche, like NonZero<T>, NonNull<P>.

note, some niche types are internal to the standard library and not exported, so you cannot use the directly, but you do see them in action when you use types that contains them, such as Vec's (and by extension, String's) capacity field: it uses half the range of a usize, so essentially, e.g. on 64 bit platforms, you can fit 63 bit discriminant into the niche, see this example: