Why does a bigger type become the smaller one when wrapped in Option?

(Sorry, if the title is kinda confusing.)

#![allow(unused)]

struct Normal {
    a: u32,
    b: u16,
}

struct Padded {
    a: u32,
    b: u16,
    _padding: bool,
}

fn main() {
    assert_eq!(std::mem::size_of::<Normal>(), 8);
    assert_eq!(std::mem::size_of::<Option<Normal>>(), 12);
    assert_eq!(std::mem::size_of::<Padded>(), 8);
    assert_eq!(std::mem::size_of::<Option<Padded>>(), 8);
}

Rust somehow prefers to add 6 more bytes to Normal when wrapping it in an Option rather than just adding 2 bytes to yield an 8 byte alignment.

If I artificially add a padding bool, Option just adds a single byte, preserving the 8 byte alignment.

Using a NonZero… also makes Option less greedy (→ 8 bytes).

struct NonZero {
    a: std::num::NonZeroU32,
    b: u16,
}

Is this behavior expected? Is there a way to make an Option of an ordinary (non-padded, non-NonZero) 6-byte type without bloating it beyond 8 bytes?

Edit: It feels like Null-Pointer-Optimization only looks for space within existing fields and otherwise falls back to just add 4 bytes to the type (with a padding of 4 bytes).

2 Likes

The optimization that allows size_of::Option<Foo>() to be the same as size_of::<Foo>() relies on so-called “niches” which are bit-patterns that are known to be invalid for the type Foo. The type Padded does have invalid bit-patterns: the _padding: bool field will always be valued with a byte 0 or 1 (the run-time representation false or true, respectively), so that a value of – say – 2 can be used in this position to represent a None: Option<Padded>.

On the other hand, perhaps somewhat surprisingly but actually true, Normal has no niche! The reason is that the space where Padded has the _padding field, in Normal, there is uninitialized memory; and at run-time any bit-pattern could be present in this space. There is also no sound way to utilize this memory, mainly because use-cases such as manipulating a &mut Normal derived from a (Some-valued) Option<Normal> must be allowed, and you can write any bit-pattern into the padding of the reference target’s padding through a reference: &mut Normal. E.g. just assigning *reference = some_value() will do a memcpy of a whole Normal value, including the padding which is usually uninitialized memory, hence you’re writing an arbitrary bit-pattern at run-time, with such a memcpy, to the padding of *reference, which means that no bit pattern can be reserved to indicate None without risking that the Option<Normal> that reference was pointing into suddenly would be (erraneously) interpreted as None.


If you want to use an even more restricted padding value instead of a bool, you could e.g. use a repr(u8) single-value enum such as

#[repr(u8)]
enum Padding { Padding = 0 }

struct Padded {
    a: u32,
    b: u16,
    _padding: Padding,
}

If someone else knows a nicer approach to achieve a similar thing, feel free to add an answer ^^

While this approach has an advantage of allowing a smaller Option<Padded> type, the (minor) disadvantage of course is that every Padded value will need to have its _padding zero-initialized at run-time, which is some (albeit really minor) run-time overhead.

6 Likes

Thanks for the thorough explanation.
But that still doesn't explain why Option falls back to consume 4 bytes if no niche could be found.

It’s 4 bytes because Normal needs a 4-byte alignment (for the u32s inside), so Option<Normal> needs to have 4-byte alignment, too; and the size of a type always needs to be a multiple of the alignment, so the next larger possible alignment-4 size after 8 bytes is 12 bytes.

But what prevents the compiler from just adding another u16 to represent the Option's variant?

// pseudo
struct Option<Normal> {
    a: u32,
    b: u16,
    is_some: u16
}

In case the basic approach of representing enums such as Option is not clear: (In the usual case, i.e. without niche-optimization) such a “tagged union” is represented by adding a tag field next to the payload. The tag for Option, which has two cases, will be a u8, and the tag will need to live outside of the Normal. In other words, Option<Normal> will thus have the same size as (Normal, u8) (or (u8, Normal)), which is 12 bytes (due to alignment, as explained above).

1 Like

It’s the same thing that prevents (Normal, u16) from being represented like this: The tag must live outside of the memory that can be manipulated through a &mut Normal that you could obtain e.g. with Option::as_mut.

So this is just a missed opportunity within the null pointer optimization?

It’s not a missed opportunity since there literally is no way to fit the additional information (i.e. the Option tag) into the padding space of the Normal value without currently allowed bit manipulations becoming unsound; bit manipulations such as implementing writing a Normal value to a &mut Normal reference target by merely copying/overwriting all the data including potential padding.

Also, I would only call it “null pointer optimization” for Option<Foo> in cases where Foo is (or contains) some kind of pointer-type, whereas in Normal, there’s no pointers involved.

3 Likes

Now it clicked, thanks for your patience!

So Null * Optimization may not change the original memory layout but only sneak in a flag. And the fall-back is just plain wrapping into another type.

Yup, that‘s accurate.

Whether this is true probably depends on what you include in “memory layout”. The layout may change in that new bit patterns may become allowed in the resulting type. Also, the Rust compiler (does not guarantee but) does perform “null pointer optimization”-style niche-optimizations even in some cases where multiple enum variants have fields, in which case the whole enum can have a more complicated “memory layout” (since the memory layout allows different cases of tagged union values). E.g.

enum Enum {
    Foo((u32, u16, u8, bool,)),
    Bar((f32,)),
    Baz(([u8; 7],)),
    Qux((u16, u16, u16,)),
}

fn main() {
    assert_eq!(std::mem::size_of::<Enum>(), 8);
    assert_eq!(std::mem::size_of::<(u32, u16, u8, bool)>(), 8);
}

In the above example, the niche is in the bool of Foo’s field, while Bar’s field is small enough so that the tag can be (in the same position as the niche of the bool) existing next to / outside of that field. The same is then true for Baz and Qux, too.

So in case of Foo the value has a bool’s value (either 0 or 1) in the last byte, which indicates the Foo case; while for Bar, the last byte (located after the (f32,)) is a tag with value 3, for Baz it has value 4 and for Qux it has value 5. This tag still has a niche left for values >= 6, so even another level of enum, say, e.g. a Option<Enum> or even Option<Option<Option<Enum>>>[1] can still be the same size in this case.

(Just to be clear, the success of the above optimization can be observed, but is not guaranteed to always happen in future compiler versions, since the memory layouts involved are not stable.)


  1. or Result<Option<Option<Option<Enum>>>, [u8; 7]> ↩︎

4 Likes