Fast initialization of Vec<Option<Vec<T>>>

I've originally posted a related question on Reddit, see https://www.reddit.com/r/rust/comments/o4yvuy/why_resizing_vecoptionvecu32_10x_slower_than/. So, if I do

let mut v: Vec<Option<Vec<u32>>> = Vec::with_capacity(1_000_000);
v.resize(1_000_000, None);

then instead of allocating a zero-initialized chunk of memory, Rust allocates uninitialized memory and then sets every third group of 8 bytes to zero, which is substantially slower. Is there a way to initialize such a type fast?

The reddit thread talks about alloc_zeroed that is sometimes used by Vec to optimize allocations, but not in your case. But we can call alloc_zeroed manually! (playground)

        type OptVec = Option<Vec<u32>>;
        let layout = alloc::Layout::array::<OptVec>(size).expect("overflow");
        let ptr = alloc::alloc_zeroed(layout) as *mut OptVec;
        Vec::from_raw_parts(ptr, size, size)

(restating a little what's been said in the thread and OP) Vec has a non-zero field and thus Option<Vec> uses null-pointer-optimization, so all-zeroes happens to be recognized as a valid None in this case. I believe this is not guaranteed though (edit: I mean guaranteed for a Vec), so I've added a little check in the playground (I wouldn't trust the check either :upside_down_face:).

Anyway, looks like this speeds up the initialization an order of magnitude ā€“ from mss to Āµss! But still, the slowest part of this whole thing is drop, which seems unavoidable (unless you leak the whole thing :wink:)

1 Like

Null pointer optimization is guaranteed for Option and all Option-like enum types.

The check itself is also UB since all zeroes is not guaranteed to be a valid bit pattern for Option<Vec<u32>> :confused:

@SkiFire13, why all zeroes is not a valid bit pattern? Is None sometimes represented by something else than zero?

@krdln, Thanks, that answers my question. And yes, I don't think drop is avoidable.

Its layout is just not defined (except for some particular cases), so you can't rely on that. The fact that the current implementation often uses all zeros for None doesn't make it guaranteed.

Is it guaranteed that NPE-ed None is all zero? If the sizeof NonZero == sizeof Option<NonZero> is the only thing guaranteed it would be legal to let None::<Vec> have bit pattern like {ptr: null, cap: 1, len: 0} since it still isn't a valid bit pattern for Vec.

The question is actually whether there is only a single valid representation for a null pointer optimized None. It would make perfect sense to have all bit patterns resolve to None of which the first word is all zeros, regardless of the remaining two words.

I fully agree with it. But it still is UB if it's not guaranteed.

Yes, sure.

For the types that don't impose additional restrictions I would say that it must be guarantee since it can't be anything else. For example NonZero only has the 0 bit pattern free, so None::<NonZero> can only be 0, similar for NonNull which can't be null (although is null guaranteed to be 0?). Things are a bit different for e.g. &T where T has an align bigger than 1, because now None::<&T> could be align_of::<T>+1 (not really practical, but there's no guarantee it's not like this).

Rather than saying that those bytes can take any bit pattern, which kinda implies they are initialized and have a specific value which must be carried over when moving, I would consider them padding bytes, which are just undefined/uninitialized.

Indeed, that's another sensible formulation, although I reckon that would be less convenient, since you would need to ensure not to read them accidentally (which is insta-UB even for a primitive integer).

It is not even guaranteed that Vec<T> will contain NonNull, so it's possible that std::mem::size_of::<Vec<T>>() != std::mem::size_of::<Option<Vec<T>>>(). Edit: Incorrect, see @bluss's comment below.

It's reasonable to be skeptical, but the docs mention it in this case

Vec<T>:

The pointer will never be null, so this type is null-pointer-optimized.

2 Likes

Reading padding bytes is not UB by itself, it depends on where you put/how you interpret those bytes, which usually mean you're doing some kind of transmute so you already need to be careful. Note that this also happens with structs, it's not exclusive to Option and you already have to deal with it.

Writing them to [u8] is UB, however, and OP seems to need exactly that.

1 Like

I'm new to Rust, and it was an interesting discussion to follow. I see that using alloc_zeroed for a faster initialization is UB, so that's a clear cost of that solution. Admittedly, my question was more hypothetical. It did originate in a particular use case (which I didn't mention) but there the difference in performance between the slower and faster initialization is negligible in comparison to the total runtime. Also, as has already been pointed out, the cost of drop always remains. I guess a broader question would be, is it then a very niche optimization or is it more generally useful?

Many common vec elements don't have drop glue, so then there is no such cost. For example a Vec of &T.

Which part of the optimization do you mean exactly? The common case for Vec's use of alloc_zeroed is a use case like vec![0; N] which is specific, but quite common and useful. It's then extended to a few more neighbouring cases like the "non-nullable pointer optimized" types.

Which part of the optimization do you mean exactly?

I meant the initialization optimization that krdln posted in the beginning, for type Vec<Option<Vec<T>>>.