Get size of enum variant

is there any sound and portable (not necessarily safe or stable) way to get the size of a specific variant of an enum? unfortunatly this is a bit more complicated than just getting the size of the associated type, due to alignment rules (eg. size_of<OptionButWithReprU8<u16>>() will have size 4, not 3).

preferably, for enums with niche optimization enabled, this would return the size of the entire enum, or possibly the size required to represent it up to the last field with a niche, but a solution that only works on primitive representation enums is also acceptable.

Variants aren't distinct types. Option::None is the same size as the corresponding Option, as it must be in order to pass &None to something that wants &Option.

4 Likes

I'm well aware of the semantics of enums within safe rust. let me define what i mean by "the size of a variant"

the size of a variant V within an enum T is the minimum value of N such that:

let old = T::V(somevalue);
unsafe {
  // not undefined behavior unless T is not valid for all byte patterns,
  // for the sake of simplicity we will just assert that about T,
  // but this could be done by using alloc() manually, since
  // raw pointers can contain invalid values without invoking undefined behavior
  let new: T = std::mem::transmute([any_bytes; size_of<T>()]);
  std::mem::copy(&old as *const T as *const u8, &new as *const T as *const u8, N)
}
// at this point the only uninitialized values are within the padding,
// which can contain any value.
assert_eq!(&old, &new);

never panics for any value of any_bytes and somevalue, and for any safe implementation of PartialEq. (unsafe code could access padding values directly).

&T isn't a problem, but &mut T could be, since it could mutate memory past the end of the variant. this can be solved by having the unsafe code never produce a mutable reference.

While this is true, there is a sense in which each variant has a size (how much of the overall enum sized is used and how much is unused) and it can be useful to know which variant is largest if you try to optimise the overall size of the enum.

2 Likes

I don't think this will is a useful measure of the size of a variant for repr(rust), there is no guarantee that Rust tries to "push" the struct towards the start of an enum, if could (theoretically at lest) decide it is better to place a field at the end in order to get more adventagous niches across multiple variants. Or just do it anyway (randomise layout is a nightly option). (I would argue that the compiler should try to push things together if all other things are equal in order to improve cache utilisation.)

Does the compiler do any of that? Not a clue, but it is allowed to.

4 Likes

to reframe my question: is there a way to find the number of trailing padding bytes in an enum variant?

if you're just trying to diagnose memory usage, knowing the total amount of padding in a variant would probably be more useful, but for my usecase, trailing padding is what i want to know.

1 Like

One strategy:

  1. Ensure the enum's repr is C or primitive, so that the discriminant is known to be at the beginning and not the end.
  2. Get the offset_of each field of the variant.
  3. Add each field's type's size to the offset, so you have their end points.
  4. Take the maximum of those end-points.

That gives you the size without trailing padding, which you can subtract from the type size to get the trailing padding.

5 Likes

If you need the size and offset with no other restriction, you need a guaranteed layout. Or I find this much more clear reading for enums.

On repr(Rust) enums, first of all, you need the POD quality (valid for all bytes), or you might clobber the discriminant by overwriting a niche. Given that, an alternative is to make all your variants wrap a single inner type, get a reference to the inner type, and calculate the offset from the reference.

For something more general...

enum E {
    Variant(u16, u8, u32)
    ....

...any "padding" could contain the discriminant, so you have to avoid overwriting any of it, not just trailing padding.

1 Like

padding may contain any value

also, i'm pretty sure i said this already, but i don't really care about supporting #[repr(Rust)] enums.

Rust-analyzer can do this. It shows the size (with discriminant) and alignment of each variant when you hover. I think formally, it's the size of the enum if all other variants are unit variants. This is good for deciding how to refactor code, but not so much for any automatic checks or runtime behavior.

I think the difference there is that it's a different type contained inside an enum variant, whereas the one in this thread is part of the enum itself.

enum E {
    Variant1(u16, u8, u32) // can use padding
    Variant2((u16, u8, u32)) // can't use padding since it may be used
                             // to produce `&mut (u16, u8, u32)`
}
5 Likes

Ah, the "which subset needs to be copied for this variant". I actually explicitly added a section to https://lang-team.rust-lang.org/frequently-requested-changes.html#size--stride about how we could offer that :slight_smile:

There's no built-in thing for it. Feel free to propose it. It'd make sense for types, too, since #[repr(C, align(4))] struct Foo(i8); also only needs to copy bytes 0..1, not all of them.

1 Like

There are many issues with this code, just to name a few:

  • if T is an enum it will likely not be valid for all byte patterns; if it has a specific repr then you can just manually compute its layout to get any size you care about without needing help from the compiler;
  • std::mem::copy doesn't have that signature, and std::ptr::copy has a similar one but takes a mutable pointer (and just adding a cast to your code will result in UB);

If you wanted it to be an example of how you would use this I would suggest you to make it something that could possibly be run so that we can talk about what you can do and what is UB.

using mem::copy was a simple typo. i know most enums aren't valid for all byte patterns, that example is restricted to enums that are (this could've been a bit more clear).

i'm curious what undefined behavior this invokes under the listed qualifications. we're asserting the enum is valid for all byte patterns, so it can't be an invalid value.. there's no references to undersized allocations, all pointers are well-aligned.. what did i miss?

The version potentially without UB would look like

let mut old = T::V(somevalue);
let new: T = transmute([0u8; size_of::<T>()]);
ptr::copy(addr_of!(new).cast::<u8>(), addr_of_mut!(old).cast(), N);
assert_eq!(old, new);

The "adding a cast results in UB" mostly refers to the fact that your destination was &old and writing through a non-mut reference would always cause UB.

Mutating the value in a let binding declared without mut might not be UB in some very specific cases. But even if that becomes the case (it isn't currently, and is an open question), doing so isn't something you should be doing, ever. Doing so would still be wrong, it would just have a defined result.

1 Like

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.