How is `Vec<Option<T>>` compiled?

How is Vec<Option<T>> compiled? From my understanding, Option<T> has a variable size: 1 byte when None, and size::<T>() bytes when Some. Is None streached in this case to take up more than 1 byte?

1 Like

Enums, like Option, are sized to hold their largest variant plus a discriminant. So, Option<T> is usually a bit larger than T. The actual amount larger depends is the required memory alignment of T: A bunch of contiguous Option<T>'s in memory will need to all have their T's aligned properly, and the discriminant is padded to ensure this.

In some cases, however, the compiler can determine that there's some bit pattern that never occurs in T and will use that to represent None. This is called niche optimization. Because references can't be null, for example, Option<&T> will represent None as the null pointer, without any extra space requirements.

6 Likes

To be clear, this is wrong. As long as T: Sized, [edit: no qualifier needed, thanks @steffahn] every value of type Option<T> has the same size, which as @2e71828 said is the size of T plus the size of the discriminant (an initial field that indicates whether this value is a None or a Some) plus possibly some padding to preserve the alignment of T.

6 Likes

And to elaborate on that: this means that some of the bytes of the storage of an enum happen to be dynamically uninitialized/unused whenever the enum holds a variant which has a size smaller than the biggest variant.

I.e. you can't change the size of a value at runtime. But you can choose to make it as big as necessary, and then only use part of it.

Imagine going hiking with a backpack. You have to decide which backpack you bring with yourself. It might be a small or a big one, but once a backpack is fabricated, you can't change its volume after the fact.

Now if you don't know upfront what kind of package you'll want to fit in your backpack, then what do you do? Well, you bring a large one. If you happen to only need a few things later, you can still put small things in a backpack and have some empty space in it. That won't change the size of the backpack itself, though.

4 Likes

For instance, given the following program (which is technically UB since it reads padding bytes, so the results are not guaranteed):

fn main ()
{
    let mut v: Vec<Option<u16>> = vec![
        Some(0xdead),
        Some(0xdeed), // <- will be `None`
        Some(0xbeef),
        Some(0x0000), // <- will be `None`
    ];
    v[1] = None;
    v[3] = None;
    println!("{:#x?}", unsafe { ::core::slice::from_raw_parts(
        v.as_ptr().cast::<u8>(),
        v.len() * ::core::mem::size_of::<Option<u16>>(),
    )});
}

I got:

[
    0x1,     // | discriminant for `Some`
    0x0,     // |
    0xad,    //     | 0xdead (little-endian)
    0xde,    //     |

    0x0,     // | discriminant for `None`
    0x0,     // |
    0xed,    //    | garbage / padding / uninit
    0xde,    //    | (in this instance we can see the trace of a past `Some(0xdeed)`

    0x1,     // Some
    0x0,     //
    0xef,    // 0xbeef
    0xbe,    //

    0x0,     // None
    0x0,     //
    0x0,     // <garbage>
    0x0,     //
]

Rust gives no guarantees regarding, for instance, the size of the discriminant (here, two bytes (second byte may be padding, though)), and the relative layout between discriminant and payload. To opt into that kind of guarantees, one needs to define their own enum with explicit #[repr(…)] annotations. For instance:

#[repr(u8)] // or `#[repr(C, u8)]`, here both are equivalent.
enum MyOptionU16 {
    None     ,
    Some(u16),
}
  • In that case, it would be guaranteed that the layout of MyOptionU16 would be:

    1. First byte would represent the discriminant: 0 for None, and 1 for Some;

    2. Second byte would be a padding byte;

    3. Third and fourth bytes would carry the u16 payload, in the Some case, or uninit bytes / padding in the None case.

5 Likes

I think you can remove the condition. Option, as well as any enum, cannot (currently) have any unsized field.

1 Like

Oops, of course -- sorry, I was thinking of structs for some reason.

To add to this discussion, Vec<T> is the same size as Option<Vec<T>>:

  • Vec contains a RawVec<T, A>
  • RawVec contains a Unique<T>
  • Unique's valid range starts at 1

Therefore, the compiler could represent None by placing a 0 where the Unique would go.

And effectively, it seems to be the case.

3 Likes

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.