Are discriminants between different instances of a generic enum stable?

I'm doing something weird, probably, but here's some code to illustrate the idea: Rust Playground

What I'm essentially doing there is using something resembling trees that grow, i.e. a enum representing some AST, extensible with different metadata based on a generic parameter tag. I want to disable some enum variants in the Limited version. Any non-disabled constructors have the same metadata between Full and Limited versions (not enforced by the type system, but as a programmer, I guarantee that is the case).

Now the question is, is std::mem::transmute-ing the Limited enum to a Full one "safe"? In other words, does Rust's choice of enum discriminants depend on the data payload of variants? Does it depend on anything at all, or is it just completely arbitrary?

Now, if that enum didn't have #[repr(u8)] annotation that would obviously be unsafe (furthermore, Rust would refuse to transmute differently-sized types). With #[repr(u8)] the layout is guaranteed to be the same, and it seems to work in the limited testing I did. But if Rust can arbitrarily assign different discriminants to the "same" enum variant between A<Limited> and A<Full>, this would be woefully unsafe.

For context, the actual enum is significantly more involved, including recursion, hence I want to avoid traversing it.

Apologies if this was asked before, this is kinda niche and specific, so searching for it turns out to be very hard.

I don't think there's such a guarantee, much less that you should use unsafe and transmute even if there were.

It would be better to show code that demonstrates your actual problem. As-is, your conversion can be written using entirely safe code.

The safe code is obvious, yes. The problem is it's O(n) if enum variants are self-referential, and the application is performance-sensitive. OTOH, transmute is basically free.

For a specific toy example, assume something like this:

#[repr(u8)]
pub enum A<Ext: ExtA> {
    A(Ext::A, Box<Self>),
    B(Ext::B),
}

I didn't want to sidetrack the discussion, hence the example is as small as I could make it to illustrate the question.

much less that you should use unsafe

I believe the general consensus is "it's fine, just make sure you're not breaking invariants".

unsafe is not for arbitrarily breaking the high-level type system. It's not to be used lightly. It's for when low-level, primitive operations can't be proven by the compiler to be safe (e.g., FFI with C code that doesn't have static ownership/borrowing information, or doing very simple operations over built-in types, like implementing integer to byte array conversions).

If you share the actual problem domain with us, we can perhaps help you write performant code that doesn't need to be wildly unsafe.

By the way, this looks even more dubious than your original example – not only does it transmute a single value, it also transmutes every pointed value, transitively, including pointers — transmuting of which is almost always wrong. Given a pointer-like type P<T> of pointed type T, it is not guaranteed that P<T> has the same representation as P<U>, thus your transmute with the recursive Box<Self> may well be UB.

1 Like

may well be UB.

AFAICT, it isn't, at least with Box, unless the initial example is UB: both pointed-to types have the same size and alignment, and pointed-to value is valid as both types as long as discriminants are the same. But as I said, I don't want to sidetrack the discussion.

This is true because () and Void have the same size and alignment. Things break (fortunately at compile time) when actual data is present in B:

impl ExtA for Full {
    type A = ();
    type B = String;
}

Ugh, I shouldn't have used (). Assume for the sake of argument I've ensured size is indeed the same (union is as large as the largest type in the union, so worst case padding with a bogus variant is an option)

Edit: actually, just simply adding align(xx) on the enum would serve the purpose, where xx is the size of the largest removed variant (rounded up to a power of 2).

According to the reference, implicitly-specified discriminants always start at zero and go up by one. Also, #[repr(u8)] on an enum is shorthand for #[repr(C,u8)].

So, it should be sound to transmute between two #[repr(u8)] enums given that:

  • They have the same number of variants,
  • Any explicitly-tagged variants have the same discriminant, and
  • Every field type within corresponding variants is transmute-safe

Edit: According to the link @quinedot shared below, there is actually a subtle difference between #[repr(u8)] and #[repr(C,u8)], but I don't think it makes any difference to my analysis here:

  • #[repr(u8)] is a union of #[repr(C)] structs with a u8 tag as the first field
  • #[repr(C,u8)] is a #[repr(C)] 2-tuple of a u8 tag and a union of #[repr(C)] structs without an embedded discriminant field

So, don't try to transmute between #[repr(u8)] and #[repr(C,u8)] without being very careful about padding & alignment restrictions.

4 Likes

I haven't looked at exactly what you're trying to do in depth, but a prerequisite for it being sound would be: Is the layout of enums guaranteed? They are not guaranteed in the default representation, but you can guide the representation.

Last I checked this was most understandable guide on enum layouts. If you use an explicit representation, and can similarly constrain the layouts of the contained data types, there may be a way to make it sound.

1 Like

implicitly-specified discriminants always start at zero and go up by one

Huh. I've looked through the reference, but I assumed that section only refers to fieldless enums. Looking at the rest of the page, it is rather evident I was mistaken. Thanks for pointing to it!

Thanks for the link. I've seen that page, and it is indeed quite handy. Unfortunately, it's not very clear from it whether data-carrying enums follow the same rules as fieldless. The Rust reference, as pointed out by @2e71828 above, is a somewhat more explicit about it.

I would suggest just setting them yourself.

As of a couple of releases ago, it's stable to do things like

#[repr(i8)]
pub enum Foo {
    Bar(u32) = 10,
    Qux(String) = -1,
}

https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=cf7b71872f515b65dc104e22ec671ad5

Then you don't need to worry about how things are picked.

This is not relevant to the original question, but no, they are not equivalent. The text describing primitive representations says

The representation is … a repr(C) union of repr(C) structs for each variant with a field.

That is, #[repr(u8)] enum is laid out as if you wrote a certain #[repr(C)] union, not as if you wrote #[repr(C, u8)] enum.

(The difference between repr(u8) and repr(C), given how each is defined, is that repr(C) has a consistent amount of padding between the discriminant and the field data — the data always starts at the same offset regardless of which variant you are looking at — and repr(u8) does not have that consistency. So, repr(u8) may be more compact but also may be harder to use for FFI.)

3 Likes