Memory layout for enums and boxes

I'm implementing a quad tree and have come across a weird case regarding enum sizes.

Rust-analyzer is telling me that the compiler uses 40 bytes (on x64) to represent the following:

// size 40 
enum Node {
	Empty,
	Region {
		nw: Box<Node>,
		ne: Box<Node>,
		sw: Box<Node>,
		se: Box<Node>,
	},
	Body(BodyType), // BodyType is size 8
}

Which I guess comes from 4 pointers (32 bytes) + 8 bytes to differentiate between the enums. Now I'm wondering why doesn't the compiler use the zero values of the 4 boxes to differentiate between the different types and reduce the need for an additional 8 bytes. Ie. Empty = all boxes are zero, Body = one of them is not empty and is exactly the value of BodyType, Region = all boxes are non zero.

It seems to be already doing these kinds of memory layout optimizations on simpler types, as zero is a non valid value for Box and it is used to represent another empty enum. For example:

// size 8
enum TestBox1 {
	None,
	Some(Box<TestBox1>),
}
// size 16
enum TestBox2 {
	None,
	Maybe,
	Some(Box<TestBox2>),
	Other(Box<TestBox2>),
}

And how can I get my original example to shave off the extra 8 bytes?

The answer to that one is easy: compiler doesn't have conscience and couldn't perform ad-hoc optimizations. Every optimization it may do it done by human who have added it to the code.

Yes. Because these structures are common and used very often. That's why it was deemed important to implement them efficiently.

With the use of unsafe, I'm afraid.

It's a bit of catch22: compiler doesn't include code to optimize things like these because they are rare. And they are rare because people optimize them by hand and then don't need to have support for them from the compiler.

Just create a union and employ bit-twiddling tricks like you would do in C. Put in separate module and thoroughly test with Miri.

1 Like

well, you can do it without unsafe, just very ugly and you don't want to look at it. playground

key is, the compiler doesn't try to steal bit patterns, uh, "sparsely", let's say.

a struct with 4 Boxes surely can have more than one "invalid" (or unused) bit pattern representations, but you need to check all the words to gather the information together, as opposed to a single comparison.

4 Likes

by the way, this is example doesn't align with your first example. and size 16 is normal layout without special optimization. what do you expect it to be? maybe like this?

// this, however, is size 24
enum TestBox3 {
    None,
    Maybe,
    Some {
        box1: Box<TestBox3>,
        box2: Box<TestBox3>,
    }
}
1 Like

That's an odd way to phrase it.

But yes, the compiler has only a handful of pre-defined ways to do niche optimizations. I can't find exact issues now, but there were a few issues filed with ideas for more optimizations, and they're difficult to add due to being performance sensitive. For example if a comparison with Empty needed to read and compare four usizes, it could be worse overall than reading a byte of a separate discriminant field.

2 Likes

Right, it makes sense that the compiler by default would try to use only one field to discriminate between enum types.

@nerditation That's an interesting workaround, but it does make the code quite a bit harder to read.

I'll add some more fields going forward with this so it's probably going to be easier to pack memory in a more efficient way but it was an interesting exercise to understand how memory layout works.

Thanks to all who contributed.

This is one of the relevant proposals (and this one should come at little or no performance cost): The proposal is about allowing additional niches for pointers based on size and alignment of a valid referent. E.g. the original Node has alignment align_of::<usize>, which allows for a niche around 0 to include not only 0, but also 1, 2, and 3 on 32-bit, and up to 7 on 64-bit platforms, because the lowest valid word-aligned pointer is at address 4 or 8, respectively.

4 Likes