Enumerations: how are they stored and other questions

This is just self-made exercise. Let's take an enumeration:

enum Message {
    Quit,
    Move { x: i32, y: i32 },
    // ...
}

I am exploring the quote below, without that much concern about the original context:

Rust sees that Message::Quit doesn’t need any space, (...)

First aspect: Does it have a value?

No value at all seems impossible. When we test in a match:

match instance {
    Message::Quit => //...
    /*...other variants ... */
}

it's comparing two values. So it is represented by a value, but as the text says, it may still not store a value, I suppose.

Second aspect: a simple model

Maybe an initial, simplified model of an enum instance could be

  1. Identity: A value for the enumeration variant, as shown below (a numeric Identifier)

    enum Enum {
    Baz,  // or Baz = 0
    Bar,   // or Bar = 1
    // etc
    }
    
  2. Data: for field-full Variants the data or a pointer to the data. Field-less have nothing.

Number 1. is also what I interpret from the enums::reference.

In this case, the reason why ::Quit would not take space is that it has no data, but not that there isn't any information about itself / its variant in memory (which seems ridiculous but who knows.) It's value would be, say, 0.

Third aspect: Variants with Data

How does it extend to field-full variants like Move(i32,i32)?

When we do a match would it be comparing:

  1. Something like a 2 assigned
  2. And then each number
match instance {
    // each is a "2"-id-match, 
    // then tests "data" fields
    Message::Move(1,2)=>// ...
    Message::Move(4,5)=>//...
    /*...other variants ... */
}

I'm just hoping for corrections, links or interesting considerations you would add.

The book has a section on this topic:

I don't think it replies any of the questions I raised. Can you let me know which, or where? I did read this already.

A more precise statement would be that it doesn't need any space for its fields. “You have a Message::Enum” is itself some information that has to be stored somewhere.

This is accurate, except that:

  • We call the “identity” the “discriminant”. Reference.

    (Also, sometimes the discriminant can be stored in a way that combines it with a field, when the field has limited values it can take on; this is called the “niche optimization”. For example, in Option<&T>, the reference cannot be null (zero), so the None value is represented as zero in the same memory that would be the reference. This doesn’t change the semantics of an enum or how it is matched, but it means it takes up less space overall.)

  • Enums never implicitly use pointers. They merely store their fields, and the field's type might be a pointer type.

This is indeed what happens, but I would describe it differently:

  • First, compare the discriminant to determine if the enum is of this variant.
  • Then, match each field value against each field pattern.

That is, once the discriminant test is done, an enum pattern's own job is done; everything else is delegated to the fields (and is exactly the same as a struct pattern match).

enums can be thought of as "sum types" even though technically they are not since the variants are not actually types[1]. In reality, a unit variant of an enum, Foo, can be thought to have type fn() -> Foo[2]. So it depends on what you mean by "has a value". If they were types though, then a unit variant would be a unit type just like ().

enum Foo {
    // Sort of like `fn() -> Foo`.
    // If this _were_ a type, it would be equivalent to `struct Unit;`.
    Unit,
    // `fn() -> Foo`.
    // If this _were_ a type, it would be equivalent to `struct NullaryTuple();`.
    NullaryTuple(),
    // `fn(u32,) -> Foo`.
    // If this _were_ a type, it would be equivalent to `struct UnaryTuple(u32,);`.
    UnaryTuple(u32,),
    // This doesn't make sense, but sort of like `fn({}) -> Foo`.
    // If this _were_ a type, it would be equivalent to `struct FieldlessStruct {}`.
    FieldlessStruct {},
    // This doesn't make sense, but sort of like
    // `fn({ x: u32, y: u32, }) -> Foo`.
    // If this _were_ a type, it would be equivalent to `struct FieldStruct { x: u32, y: u32, }`.
    FieldStruct { x: u32, y: u32 },
}

Despite the stated RFC not making the variants a subtype of the enum; theoretically the variants would also be subtypes.[3]

Now how much storage is needed for an enum? Well the purpose of an enum is to represent an instance of exactly one type from a set of possible types. From this, the size of an enum will be at least the size of the largest variant. Of course, you need to track which specific variant as well so this may increase the storage depending on how many variants there are.

/// A true "uninhabited type". The total number of possible
/// instances of this is 0: `size_of::<Uninhabited>() == 0`.
enum Uninhabited {
}
/// There is exactly one instance of `Zst`: `Zst::Foo`;
/// however since there is only one instance, we can
/// optimize the storage such that it takes 0 bytes since
/// there is no runtime information to be had. Thus, like
/// `()` which has exactly one instance,
/// `size_of::<Zst>() == 0`.
enum Zst {
    Foo,
}
/// Despite there being two variants, there is only one
/// instance since `AnotherZst::Uninhabited` can never
/// be constructed; thus this is optimized to take 0 bytes.
enum AnotherZst {
    Foo,
    Uninhabited(Uninhabited),
}
/// Of course there is no guarantee that an `enum`
/// that _could_ be a ZST will actually be:
/// `size_of::<NotZst>() == 4`. This is not surprising
/// seeing how `(Uninhabited, u32)` is also not a ZST
/// despite not being constructable.
enum NotZst {
    Uninhabited(Uninhabited, u32),
}

  1. There was an RFC to make enum variants proper types, but it has since been closed. ↩︎

  2. Technically this is not true as only tuple variants have fn type. ↩︎

  3. The RFC does state "permitting variant types to be subtypes of enum types" as a future possibility. ↩︎

The language in the document you're quoting is discussing the size contributed by the enum variant's associated data (i.e., its fields). Since Move Quit has none, those nonexistent fields contribute nothing to the size of the enum. However, the enum taken as a whole must necessarily still be large enough to

  • hold all the fields of any other variant, and
  • distinguish the variants from one another.

In practice, this means that even an enum consisting of only unit variants takes up some space, to hold a discriminant identifying which variant the value holds, with a few exceptions for things like the niche value optimization. In principle an enum consisting of only a single, unit variant might be a zero-size type; I haven't checked whether it actually is, or not.

When we do a match would it be comparing:

  1. Something like a 2 assigned
  2. And then each number
match instance {
    // each is a "2"-id-match, 
    // then tests "data" fields
    Message::Move(1,2)=>// ...
    Message::Move(4,5)=>//...
    /*...other variants ... */
}

From the point of view of a language user, each match arm gives a pattern that must match the value provided for that arm to be taken. The arm Message::Move(1, 2) will only be taken if these three conditions are all met:

  • instance holds a Message::Move variant,
  • The first field of that variant is exactly 1, and
  • The second field of that variant is exactly 2.

From an implementor's point of view, or if you're reading the resulting code, you'd likely see some variation on three integer comparisons - one for the discriminant, one for the first field, and one for the second field. However, they might be packed into less than three distinct operations; for example, if the variant Move(1, 2) ultimately fits within a single 64-bit word, the compiler might emit a single 64-bit integer comparison rather than three separate comparisons, if the resulting code correctly implements the match statement.

It is, though I don't know if I've seen a formal guarantee of this. OTOH it'd be a breaking change to make them non-ZSTs, since being a ZST can be load bearing elsewhere in the language (niche guarantees, repr(transparent), aliasing concerns maybe).

It's guaranteed if you repr(transparent) it:

#[repr(transparent)]
pub enum StructInDisguise {
    Hello,
}

https://play.rust-lang.org/?version=stable&mode=debug&edition=2024&gist=dbe11ef68bf7905a4acf43d97aa6caf6

[src/main.rs:7:5] size_of::<StructInDisguise>() = 0

Ah, great.

But if it's not guaranteed without repr(transparent), shouldn't there be an issue analogous to this one where repr(transparent) is transitively required to be considered a ZST within another repr(transparent)?

Sort of like this, but since the non_exhaustive portion is covered by the other issue and enums can't have private variants, it's not about what the enum owner is allowed to do, it's about what the compiler is allowed to do.

Based on quick testing using rustc 1.88.0, the following list of types seems to be ZSTs:

  • !
  • ()
  • n-ary tuples iff each of the contained types is a ZST (vacuously true for a nullary tuple[1])
  • [T; N] iff N is 0 or T is a ZST
  • Unit structs
  • Tuple structs iff the equivalent tuple is a ZST
  • structs with fields iff each field is a ZST (vacuously true for field-less structs)
  • enums where all variants, sans at most 1, are uninhabitable and ZSTs; and the at most one remaining variant is a ZST (vacuously true for enums without variants)

While I realize those types may not be guaranteed to be ZSTs, they currently are now.


  1. I realize there is no nullary tuple since it's replaced with the unit type (); but when talking about tuple structs and enums that have tuple variants, the property holds when n is 0. ↩︎

I am reading this one to see exactly how an example would be laid out in memory, not sure it has it though.

I do know the concept of alignment but not much else.

So far what I read is that the discriminant is an isize; so a few bytes are always reserved for it.

also I didn't quite get what that statement means. But the rest I did.

TL;DR

I think my question from tests below reduces to:

enum E {
    A = 5,
}
assert_eq!(E::A as isize, 5);
  • Where is the 5 stored, so that the test passes?

Apparently something like "compiler removes the enum altogether and uses a 5 instead" happens. But isn't that still an isize` ?


I ran some simple tests of enum variants (on the default representation, I'm not interested in other representations).

The discriminant is sometimes non-existent or "Zero-Sized".

This happens for enums with no variants (not unreasonable) but also for 1 variant (even if one assigns OneVariant=256), which is strange.

After all one can match it so the discriminant must exist somewhere; I don't see how something of size 0 could have a discriminant.

For example, the following assert_eq! doesn't panic!; I'd expect it does from "size 0" I guess.

    enum Enum { Foo }

    assert_eq!(0, Enum::Foo as isize);

although that's consistent with why enum Enum { Foo = 256 } also takes no space as tested earlier.

I do see that some existing values have size 0.


PS: sorry i kept editing as i found new information.

If you do not request any specific layout from the compiler, the compiler avoids making guarantees whether something is stored in memory.

For the first example,

  1. E has only one variant with no fields, so no space is required; any value of E must have variant A.
  2. E::A is a value of E and therefore needs no space.
  3. <value of type E> as isize matches the provided value against the inhabited variants, which happen to be E::A and nothing more, and returns the associated discriminant integer.
    Importantly, it does not necessarily read the memory behind the value, nor directly return it.

So, that 5 is stored by compiler.

That's not correct. For example this type only takes 2 bytes:

enum Foo {
    A(u8),
    B(u8),
}

1 byte for the discriminant to distinguish A from B, and 1 byte for the u8.

In n bits you can describe a choice between 2n possibilities. In particular, you need 0 bits to describe a choice between 20 = 1 possibility.

If there is only one option, you don't have to say anything to tell me which option you chose.

Theoretically 5 is stored somewhere in static memory. You only need one copy of the 5, not one copy per instance of E because every instance of E shares the same value.

In this particular example the 5 can get optimized out from the generated machine code because it's only used in an expression that can be computed at compile time.

I don't disagree, but what I described is from here.

Under the Rust representation, the discriminant is interpreted as an isize value. However, the compiler is allowed to use a smaller type (or another means of distinguishing variants) in its actual memory layout.

And assuming the default repr there was Rust's one.

I think I didn't interpret the second sentence correctly earlier, and is what allows it to be 1 byte, or even 0 apparently.

I understand this but am unsure in reality you can use 0 bits to store a number; 1 possibility may still need 4 bytes if it's a u32. You need a place to store it and that is what you mentioned at last in my opinion.

So you seem to be talking about choices but I am also talking about values.

So I think one of the issues in the text/explanations is that there isn't a distinction between stack/heap (or the instance-used memory) vs other memories.

Yes but that place doesn't need to be part of the enum instance, the table of values can be stored separately and shared by all the instances.

There isn't even necessarily a “table” of any sort. For example, consider these functions.

pub enum Enum1 {
    A = 5,
}
pub enum Enum2 {
    A = 5,
    B = 6,
}

#[inline(never)]
pub fn e1_value() -> Enum1 {
    Enum1::A
}

#[inline(never)]
pub fn e1_int() -> isize {
    Enum1::A as isize
}

#[inline(never)]
pub fn e2_value() -> Enum2 {
    Enum2::A
}

Put them in the Playground and ask it for the optimized assembly code. You will get:

playground::e1_value:
	ret

playground::e1_int:
	mov	eax, 5
	ret

playground::e2_value:
	mov	al, 5
	ret

e1_value doesn't need to do anything at all, because it's returning a zero-sized type. e1_int needs to produce 5, but it does not do it by consulting a table in memory; it stores the integer 5 into a register, and the value originates from that the literal argument of that one mov instruction — in-line in the rest of the machine code.

e2_value demonstrates that when the enum is not zero-sized, the explicit discriminant values are also used and so e2_value must actually return a value designating Enum2::A rather than Enum2::B. Note that the function writes the 5 into al instead of eax; that’s because al refers to a one-byte register (which happens to also be the lowest byte of eax, but that doesn’t particularly matter) and the enum's representation is just one byte.

But what about reading an enum? Will that have a table of values? Not necessarily.

#[inline(never)]
pub fn make_a_choice(value: Enum2) -> &'static str {
    match value {
        Enum2::A => "apple",
        Enum2::B => "banana",
    }
}
playground::make_a_choice:
	xor	edx, edx
	cmp	dil, 5
	lea	rcx, [rip + .Lanon.7c78d491cad091a45e9d39e7c3d3e217.0]
	lea	rax, [rip + .Lanon.7c78d491cad091a45e9d39e7c3d3e217.1]
	cmove	rax, rcx
	setne	dl
	add	rdx, 5
	ret

It simply compares the incoming value with a literal 5, because that is all the choice that needs to be made. It might even compute a numerical result by doing arithmetic on the discriminant value. The compiler will do whatever is most efficient (as far as it can tell) when choosing the concrete representation (if not overridden), and when generating machine code to manipulate that representation.

Which isn’t to say that there aren’t ever tables. If we make a bigger match:

enum SeveralThings {
    One = 1,
    Two = 2,
    Three = 3,
    Four = 4,
    Five = 5,
}

#[inline(never)]
pub fn bigger_choice(value: SeveralThings) -> u32 {
    match value {
        SeveralThings::One => 7,
        SeveralThings::Two => 1000,
        SeveralThings::Three => 9,
        SeveralThings::Four => 0,
        SeveralThings::Five => 14,
    }
}
playground::bigger_choice:
	movsx	rax, dil
	lea	rcx, [rip + .Lswitch.table.playground::bigger_choice]
	mov	eax, dword ptr [rcx + 4*rax - 4]
	ret

Now it is fetching values from a table. (The lea instruction obtains the base address of the table, and the mov instruction loads the value from the appropriate index in that table. Unfortunately, the Playground’s filtered output doesn’t actually include the table itself.)

If there is only one value an expression could possibly have, then the compiler is allowed to infer the use of that value without threading it through the whole program from where that value is nominally produced to where it is consumed. Rust - and in fact many modern compilers - do this where possible, because it saves on time and code size at no cost to the language user. As a result, an enum with only one variant doesn't need a discriminant - since that's the only variant a value of that type could possibly have, the type alone is enough for the compiler to reason about the value.

One consequence of that is that your program

enum E {
    A = 5,
}
assert_eq!(E::A as isize, 5);

is compiled away entirely when optimization is turned on. The compiler is able to observe that the conversion to isize must return 5, so the comparison in the expansion of assert_eq always passes, so the panic is never reached, so the whole thing can be thrown away.

Conversely, a failing version

enum F {
    A = 6,
}
assert_eq!(F::A as isize, 5);

compiles to an unconditional panic, since the compiler is able to observe that F::A as isize is never equal to five, so it doesn't need to emit code to check at runtime.

Rust doesn't do this for more than 1 values, but there could be such an optimization. Consider:

enum Foo {
    A = 1231231,
    B = 3434334,
}

It could potentially be stored in 1 byte, an offset to a table of 2 values.