Memory layout optimization opportunity?

Hello,

I was pleasantly surprised that rustc is smart enough to make AOrB the same size as the larger of its variants in the following example:

#![allow(dead_code)]

struct StructA(usize);
struct StructB(usize, Option<u8>);

enum AOrB {
    A(StructA),
    B(StructB),
}

fn main() {
    dbg!(std::mem::size_of::<StructA>());
    dbg!(std::mem::size_of::<StructB>());
    dbg!(std::mem::size_of::<AOrB>());
}

which outputs:

[foo.rs:12] std::mem::size_of::<StructA>() = 8
[foo.rs:13] std::mem::size_of::<StructB>() = 16
[foo.rs:14] std::mem::size_of::<AOrB>() = 16

I don't know how this is implemented in practice, but the compiler must introduce a bit pattern for the second half of AOrB that is invalid as part of StructB and thus signals whenever the A variant is present.

However, if in the above Option<u8> is replaced by u8, the size of AOrB grows to 24 bytes. Is there anything fundamental that prevents the compiler from realizing the same layout optimization?

In struct StructB(usize, Option<u8>), there is a discriminant for the Option that can only take two values, either representing Some or None variants. The remaining bit patterns of that discriminant byte are called a "niche", which can be reused by the outer enum AOrB to indicate outer variants. So if the Option discriminant uses 0 or 1, it might use 2 to represent AOrB::A. This also requires that StructA is smaller so it doesn't occupy the same memory as this niche.

When you change to a plain u8, there's no longer any niche available. There will be padding in StructB, but that could be written to any bit pattern by code that writes to &mut StructB, so there's no room to choose special values for AOrB. But a trick sometimes used is to explicitly introduce padding with a particular value, so the rest can be a niche, like:

#[repr(u8)]
enum Padding { Valid }
struct StructB(usize, u8, Padding);
11 Likes

Thanks for the instantaneous reply!

Interesting, I didn't know that it's allowed to overwrite the padding in a struct.

An afterthought:

Since these are non-public types, the compiler could know that no one ever writes to the padding, couldn't it?

So, if someone bothered to implement it, said optimization could be realized for those private types where the compiler can prove that it's OK, I guess? This approach would also allow many other optimizations such as storing a [bool; 8] inside a single byte if no one ever has to take a reference to one of its elements.

I guess it would be problematic that size_of<T>() would then depend on what code does or doesn't do. But the compiler could still apply such optimizations covertly, couldn't it? (I guess that inside of a function LLVM will already do such things.)

It's allowed to e.g. move a struct by moving each field, ignoring the padding. More generally, padding bytes are considered not just "fill with random garbage", but uninitialized.

The compiler changing some padding into non-padding (so it could stash a discriminant there, say) would be problematic because the padding can get clobbered when using MaybeUninit to overwrite bytes, or other semantically equivalent operations (like memcpy).

I guess it's technically possible to do automatically if the compiler did some global analysis to see certain operations never happen, but that seems unlikely. It could be opt-in though.

See also

1 Like

The rules here come from a confluence of a few things:

  1. It would be extremely confusing if moving something made it "be different".
  2. The compiler wants the freedom to sometimes copy a type as a large chunk. For example, struct Foo([u8; 3], u32); is horribly awkward to copy just the correct parts, but copying it as one 64-bit thing works great. (Obligatory note that it's not guaranteed to be 8 bytes, but it happens to be today.)
  3. The compiler wants the freedom to sometimes copy a type field-by-field. For example, in #[repr(align(64))] struct Bar(u8); it would obviously like to just copy the one byte, rather than 64 bytes.
  4. Humans writing unsafe code want this flexability too -- to be able to move things by memcpying them, but also to move or initialize them field-by-field only.

The best way, then, to accommodate all those needs to to define, in the abstract machine, a 257th possible value for a byte: uninitialized. That way when you copy the whole thing, it's still uninitialized, which is allowed to be materialized by doing nothing in codegen. Or if it's never initialized in the first place, because you only wrote the fields, then the remaining bytes are still uninitialized, and reading them as u8 is UB.

(Of course, if you need to preserve particular values of things, you can use a MaybeUninit<Qux> instead of a Qux. That tells the compiler that it needs to keep around exactly the value that you wrote into the padding. Assuming you actually did write one, of course -- if you write a Qux over that memory, it goes back to be uninitialized again because that's what you wrote in the AM, even if when you look at the assembly that doesn't necessarily write anything in the codegen.)

1 Like

For the reasons already touched on, it's not really practical to derive and maintain this knowledge, as it's both about not writing the bytes and not not writing the bytes.

And if you have a type where the size overhead of not having a niche becomes problematic, it's not particularly difficult to introduce a niche either with the ZeroU8 enum or a Box indirection. The only "difficult" part is knowing when doing so is beneficial.

This kind of "knowing when" optimization is usually delegated to the compiler, which generally has a better chance of knowing when than the developer does. But in this case due to the low level memory control Rust offers (e.g. untyped memory) it falls to the developer to make this memory allocation call.

2 Likes

Thanks to everybody for all the insightful comments. Once again I learned a lot!

I knew Ralfj's blog posts about uninitialized memory and how “low-level” languages are effectively running on an imaginary virtual machine and not on bare metal, but still the consequences for enum layout were not obvious to me.


Here is one more aspect of the above that I find noteworthy:
If one gets rid of the explicit StructA and StructB types and instead defines AOrB directly as

enum AOrB {
    A(usize),
    B(usize, u8),
}

the size of AOrB is 16 bytes, not 24 like with the explicit variant types.

It seems clear to me why this is the case: since now it is impossible to return a reference to a variant type, the compiler is able to find a “niche” to store the discriminant.

However, this also means that if all enum variants were automatically turned into types (see the first post of the thread linked here for an overview), there would be less room for layout optimizations.

2 Likes

not necessarily, because the compiler knows they're enum variants so it can pick a different layout than the naive struct layout.

Good point!