use std::{alloc::Layout, mem::MaybeUninit};
fn main() {
let layout = Layout::array::<MaybeUninit<u8>>(10).unwrap(); // [MaybeUninit<u8>;10]
unsafe{
let ptr = std::alloc::alloc(layout); // #1
let rf = std::slice::from_raw_parts(ptr, 10); // #2
std::alloc::dealloc(ptr, layout);
}
}
#1 allocated an array of type [MaybeUninit<u8>;10], and the memory locations occupied by the array are uninitialized. However, #2 produced a reference to a slice that points to that uninitialized array. However, the reference is of type & [MaybeUninit<u8>].
The Rust compiler assumes that all values produced during program execution are “valid”, and producing an invalid value is hence immediate UB.
[...]
A reference or Box<T> must be aligned and non-null, it cannot be dangling, and it must point to a valid value (in case of dynamically sized types, using the actual dynamic type of the pointee as determined by the metadata). Note that the last point (about pointing to a valid value) remains a subject of some debate.
Since [MaybeUninit<u8>] is a dynamically sized type, the actual dynamic type of the pointer is [MaybeUninit<u8>;10]. The uninitialized memory for an array [MaybeUninit<u8>;10] should be considered as an invalid value. Because the reference does not point to the type MaybeUninit<[MaybeUninit<u8>;10]>
The difference between them here is that the memory occupied by [T; N] should be initialized such that the value of type [T;N] is valid; instead, the memory occupied by MaybeUninit<T> can be uninitialized.
However, I often see some libraries use & [MaybeUninit<u8>] to denote a piece of uninitialized memory. I test the code under MIRI, and it doesn't report UB. So, what's the reason here? Does my understanding of the cited rules have some deviation?
Is there any document or reference that says that an array of MaybeUninit<T> can have all the memory it occupies uninitialized? Furthermore, how about struct, for example
The short answer was already given above: &MaybeUninit<T> is a reference to the MaybeUninit object itself, not to the data it may contain. And MaybeUninit is always considered a fully initialized instance, even if it holds uninitialized memory internally.
&[T] and &[MaybeUninit<T>] are two different types. The compiler reasons primarily about types, not the raw bytes they contain.
This section from the documentation formally explains the rule:
For a union, the exact validity requirements are not decided yet. Obviously, all values that can be created entirely in safe code are valid. If the union has a zero-sized field, then every possible value is valid. Further details are [stillbeing debated.
Under the hood, MaybeUninit is a union with a zero-sized () unit variant.
MaybeUninit<T> is always considered as a valid value even though the memory it holds is uninitialized; this is true. However, you didn't explain why [MaybeUninit<T>;N] is also a valid value when the array itself occupies the uninitialized memory.
So, I think the key point is documented in the reference
A struct, tuple, and array requires all fields/elements to be valid at their respective type.
In other words, an array has a valid value if all of its elements are valid. Since the element of array [MaybeUninit<T>;N] is of type MaybeUninit<T>, it holds the above conclusion, so the containing array is valid even though it occupies an uninitialized memory.
It's been said above: the elements are valid, so is an array of them. The elements are already covered by their type, so what's important for the array itself is that it's correctly allocated: pointer address and length, and that it'll be correctly unallocated (and of course that you didn't create any aliasing issue).
Note that it's not the reference in itself that could cause undefined behaviour. The problem is using it when it's not sound.
I'm not sure if I understand your point, but if by that you meant that &T pointing to uninitialized T cannot cause UB if it is not used, then this is wrong. Mere existence of invalid references is immediate UB, because compiler can do optimisations based on the knowledge that references point to valid T.
However if you mean that &MaybeUnint<T> cannot cause UB unless used incorrectly, this is of course true. As long as it is aligned to alignment of T and points to an allocated object containing T, the actual memory does not have to be fully initialised.
Do you have a concrete case to illustrate that? I'd find it very surprising, but maybe I'm missing something. In this case, at worst the compiler would be stopped from doing optimizations since there's a reference to some location (since there's no effective lifetime, I'm not even sure it could do anything at all).
Producing an invalid value is immediate UB, and a reference (or a Box, but not a pointer) that’s dangling or does not point to a valid value is an invalid value. "Producing" essentially means "causing to exist in any way" during an execution.
I agree that an invalid value could lead to undefined behaviour, but I'm still sceptical about an unused reference to it. It's interesting; I'll have to investigate that a little further.
Anyway, I think it's quite rhetorical here, since I suppose the purpose is to use that reference.
EDIT: @xmh0511 Isn't the type above rf: &[u8] instead of &[MaybeUninit<u8>], by the way? I took the latter for granted from the first post (or maybe I misunderstood what you meant), but it's apparently the former, so it looks indeed like a reference to invalid values. This won't cause UB, but this will with Miri.
No, what I meant is why [MaybeUninit<u8>;N] is also a valid value for the given uninitialized memory when MaybeUninit<u8> is valid for that memory. If there were no other rules that could clarify this point, the value of the array itself wouldn't be valid when its occupied memory was uninitialized.
You could have a reference rf: &[MaybeUninit<u8>] if you casted the raw pointer (either in from_raw_parts or when it's created), though I'm not entirely sure the whole sequence is how it should be done—someone please correct me if there's a better way:
let rf = std::slice::from_raw_parts(ptr as *mut MaybeUninit<u8>, 10); // #2
My earlier reply, and I think others' replies, were about that array of uninitialized MaybeUninit<u8>:
the reference pointer is not dangling because it's been allocated, and its array length is correct
it's not misaligned because you provided the correct layout
(with the modification above) all the values of the array are valid since they're MaybeUninit<u8> and haven't been marked as initialized
So it's fine for the array. Those values can also be read without any problem.
A &[u8] reference to uninitialized values, however, is in the list of UB though debatable. Miri doesn't flag any UB behaviour as long as the reference is not used to read values, even if it's used to print its pointer value. But if you have a reference in your code, it's because you want to use it, so the code in the first post should normally lead to UB sooner or later, unless you change the type as first intended.
Once the code constructs not &[u8] but &[MaybeUninit<u8>]...
Why? The size is in reference, it is not in the pointed-to memory which could be init or not. The data... well underlying array type (MaybeUninit<u8>) admits any initialization status, so just what part of the array would require bytes to be init? Or what part of documentation would say that such is an invalid value?
I asked to find that document that says something like
underlying array type (MaybeUninit<u8>) admits any initialization status, so just what part of the array would require bytes to be init
I found that wording and cite them in my above comments.
A struct, tuple, and array requires all fields/elements to be valid at their respective type.
That means, if the element is valid value so is the array. If there were no such wording, we couldn't say an array occupying the uninitialized memory is valid when its element is of type MaybeUninit<u8>