Enum with or without Box?


#1

Should I always prefer the enum variants with boxed types to ones without boxes?

enum A {
    A((u64, u64, u64)),
}

enum B {
    B(Box<(u64, u64, u64)>),
}

Enum and inheritance
#2

The contained tuple in A lives on the stack (if the enum lives on the stack), while in B it always lives on the heap. In memory an A value is going to contain the enum tag plus the 24 bytes for the tuple, while B will contain the tag and a pointer for the Box.

Which one you prefer depends firstly on where you want the tuple to live and secondly on your needs regarding the memory size and layout of the enum. If you don’t know about these things or aren’t concerned with them, you most likely prefer A.


#3

What changes if I want to store there big structures, a couple of them in a tuple inside the enum variants?


#4

@aschampion
How are these packed (byte align) in memory? For example these enums (A and B) are they byte aligned in both cases?


#5

The size of the enum will be the size of the largest variant, plus a discriminant (if one is needed, but that will be the case for virtually any enum with more than 1 variant).

Similarly, the alignment will be of the largest variant. In the case of A and B above, they’re both 8 byte aligned on 64bit, and B would be 4 byte aligned on 32bit.

As for @vityafx’s question, the answer depends :slight_smile:. A couple of big considerations I’d take into account:

  • if there’s one variant outsizing all the rest, and it’s not the dominant variant in terms of frequency, I’d box it.
  • if I’m copying (or moving) them around a lot more than accessing their data, I’d consider boxing the larger variants.

Otherwise, I wouldn’t box. Again, these are pretty general. The gist is I’d tailor its size based on how I’m using them, and make sure a large variant (or two) don’t make me pay an unwanted perf penalty.


#6

What about a enum of about 10 variants with parameters count 1 to 3 (without tuple), and each parameter is a big structure of heap-allocated collections (String, Vec and so on). Should I use Box in this case?


#7

It’s hard to say without knowing how you’re using this enum. 3 Vecs or Strings is a healthy 72 bytes on a 64bit system - copying that around a lot might be noticeable.

If in doubt, stick to unboxed layout and then see if profiling shows hotspots associated with copying/moving this enum around.


#8

The thing is that the clippy warned me that I should use Box. But, according to its logic, I should use enums with Boxes all the time if my enum consists of big objects. Is it correct?

P.S. I am sending this enum to the mpsc::channel.


#9

Well how big is the enum exactly? For clippy to warn on it, I believe there has to be (by default) at least a 200 byte difference in size between the smallest and largest variants. That’s certainly more than just 1-3 Vec/String values :slight_smile:


#10

You understood me wrong.

I meant this:

struct A {
    a: Vec<String>,
    b: Vec<SomeOtherBigStructure>,
    c: String,
    d: HashMap<>,
}

struct B {
// ... the same as above
}

struct C {
// ... the same as above
}

enum E {
    A(A, C),
    B(A, B),
    C(A, B, C)
}

And so on.


#11

Note that Vec<SomeOtherBigStructure> allocates all of these big structures on the heap. So while each SomeOtherBigStructure may be really big, Vec<SomeOtherBigStructure> will almost certainly be 24 bytes large. (3 usize values: pointer, length, capacity)


#12

Ok, that’s a bulky enum indeed. How is the enum used though? We keep talking about its size, which is important, but you’ve not said anything about how you’re using it. In particular, are you copying/moving it around a lot?

If you own the structs, you may also be able to re-organize their layout to better fit usage characteristics (e.g. move some of their parts to the heap, rather than moving the entire struct to the heap inside the enum). But really, it’s hard to speculate without having more concrete information.


#13

I told that, you missed :slight_smile:
I am sending object of this enum to the mpsc::channel.


#14

Nope, I didn’t miss that :slight_smile:. What you’ve not said is whether you construct this enum right before sending it on the channel or whether you’re taking it (moving or copying) from somewhere. You’ve also not said how you’re accessing the data of the enum on the receiver side of the channel. You’ve also not indicated whether any of the 10 variants are more common/frequent than others, and whether they’re the bulkier of the bunch.


#15

All of them are nearly equally frequency used. I construct these objects and passing to the mpsc::channel immediately. The enum variants consist of event-related objects which I send on some event. I access this data simply: I receive enum objects from a mpsc::channel's receiver and then match them with unpacking.


#16

Ok cool. In this case, it would seem ideal if we had placement construction where the enum can be constructed right in the channel. There’s some support for placement-new in nightly (I believe) but I don’t know offhand if channels have support for it.

The reason for this is because there’s some risk that the enum will be first constructed on the stack and then copied into the channel, which is wasteful. But it’s possible LLVM can optimize this away - only way to tell is by looking at the assembly. And even if this optimization does kick in, it’s conceivable that it’ll break with slightly different code shape.

However, unless you use the box syntax, there’s no guarantee this won’t happen with Box either! So in that regard, I’d stick with non-box storage to start. If you see the copying is showing up in profiling, you can try boxing it up and seeing what happens. Avoiding the box has an extra advantage of not going through the allocator two extra times (to alloc and then to free).


#17

Definitely not always; why would you create a pointer “just for the sake of it” when you don’t need one? You need to decide whether you need a Box (e.g. if your enum is recursive via its variants’ associated data, or if the type you are putting in the enum needs to have a stable address as with some OS-specific mutexes, etc.), and if you do, then sure, go for it — but I would actually prefer defaulting to direct embedding. (In most cases, the type system will tell you if that won’t work, and then you can change it.)