Provenance when reading pointers from erased type

my apologies. after some pondering, I think my statement was wrong, a better wording would be, "MaybeUninit solves problems that are orthogonal to provenances".

MaybeUninit, as its name suggests, deals with uninitialized memory. it's UB to read uninitialized values, or to create references to them, unless the value has the type MaybeUninit<T>. here T can be any type, this includes, but is not specific to, pointer types.

provenances are just some information of pointers (in addition to "memory addresses"). when reasoning about provenances, we must have valid pointers to begin with. uninitialized pointers are not valid pointers, obviously.

in the linked post, MaybeUninit is mentioned when they talk about transmutations, where MaybeUninit is "the right type to hold arbitrary data", to avoid "under-specified layout" related UB. and naturally, "arbitrary data" includes pointer types, and it happens that the exact layout of pointer types (consisting of provenances and addresses, and also metadata if we talk about "fat" pointers) is unspecified.

so, using slice of u8 as the output parameter to store untyped ("arbitrary") data would be indeed incorrect, however, using Vec<u8> is not UB immediately but still unsound. instead, we can either use Vec (or slice) of MaybeUninit<u8>, or we can
use raw pointers NonNull<u8>.

note this is the same situation of transmutation: type punning through memory behaves "as-if" it was transmuted, it is not specific to provenance per se, but it apply to pointers, thus the associated provenances.

3 Likes

This is something I don't fully get yet, it seems to me that using Vec::<u8>::as_mut_ptr() or Vec::<u8>::spare_capacity_mut() would yield the same result. What is the intuition behind it being meaningfully different from Vec<MaybeUninit<u8>>?

With Vec<u8> the compiler can assume that 0..vec.len() are valid and initialised and that they can't store provenance, because u8 can't store provenance. You could probably get around that by letting the Vec lenght be at 0 all the time, but at that point the Vec is pretty useless.

Vec::<u8>::spare_capacity_mut() returns a &mut [MaybeUnini<u8>], but it is expected (see the documentation) that you later use set_len() and then those MaybeUninit<u8> get turned into just u8. With set_len you basically say you (the programmer) have initialized them, the compiler can get rid of the MaybeUninit. Which is not what you want as at that point you would lose provenance.

as_mut_ptr() i think is basically the same. On one hand yes, sure you are using raw pointers, the memory doesn't have to be initialized, the correct type, anything. But in the docs again it is assumed that you want to call set_len later, with all the problems i mentioned.

Another question that came to my mind is: have you taken care of padding?
If you write an object into your byte buffer and that struct has padding bytes, those bytes in the buffer will be uninitialised. With MaybeUninit<u8> that won't be a problem.

1 Like

Really appreciate explanation, that makes sense now!

Time and time again I am impressed how much more nuanced unsafe Rust is comparing to C :sweat_smile:

Yes, for this particular purpose buffer will be allocated to be as large as the data structure it represents, including padding bytes. In this particular instance I'll only be storing pointers and u32s, so it is not difficult to handle.

1 Like