Is it UB to std::ptr::copy padding bytes?

From the rustnomicon list of UB causes

raw pointer read from uninitialized memory

Since padding bytes are uninitialized, and std::ptr::copy is just memmove which will blindly copy every byte, won't it read uninitialized bytes while performing a copy of any struct with padding bytes? If this doesn't count why doesn't it?

std::ptr::copy() does not need to assume that its arguments point to initialized memory. It can effectively treat the *const/mut T pointers as if they hold [MaybeUninit<T>] slices. It's only when we use a T or &[mut] T directly that it must be initialized. (The Rustonomicon is slightly inaccurate here: reading from an uninitialized *const/mut MaybeUninit<T> pointer is valid, since MaybeUninit values are allowed to have uninitialized bytes.)

1 Like

But usually the API for MaybeUninit gives you no way to do a read of the value without implicitly asserting that you know the data really is initialized (to read you have to call unsafe methods that have this as a precondition). So is it also considered to be that MaybeUninit carries an additional superpower of making raw pointer reads of its underlying bytes not UB?

There's actually nothing special about "padding" bytes; they're the same as all uninitialized bytes. (You can also get uninitialized bytes by, say, looking at uninitialized parts of a Vec's buffer.)

The problem is not about the reading specifically, but into which type you read it. Remember to look at the whole bullet:

an integer ( i* / u* ), floating point value ( f* ), or raw pointer read from uninitialized memory, or uninitialized memory in a str .

If you ptr::read::<u8>(p.cast()), then yes it's UB if p points to a padding byte. But if instead you ptr::read::<MaybeUninit<u8>>(p.cast()), then it's totally fine -- that reads uninitialized memory into something that can hold an uninitialized value.

So, loosely, you can think of copy as copying MaybeUninit<_>s. (Technically it's an intrinsic, so it's its own thing, because it's a big magic in that it also reliably copies pointer provenance and things like that, which are hard to copy in normal rust code, especially if the underlying data contains pointers at unaligned addresses.)

4 Likes

But usually the API for MaybeUninit gives you no way to do a read of the value without implicitly asserting that you know the data really is initialized (to read you have to call unsafe methods that have this as a precondition).

You can read an uninitialized MaybeUninit<T> as another MaybeUninit<T>, but you cannot read it as an ordinary T. Effectively, a value of type T is always an "initialized T"; a MaybeUninit<T> is a "possibly uninitialized T".

As it happens, MaybeUninit isn't particularly special. In fact, it's just an ordinary union:

#[repr(transparent)]
pub union MaybeUninit<T> {
    uninit: (),
    value: ManuallyDrop<T>,
}

Due to the semantics of unions in Rust, none of the variants are considered to be "real" until they are read from or written to. An uninitialized MaybeUninit can simply store a valid (), followed by padding bytes corresponding to the size and alignment of T. This means that there's no such thing as an invalid MaybeUninit<T> value, so long as the program has access to the underlying bytes.

2 Likes

I took "/or/ raw pointer read" to mean in addition to the integer or floating point cases any raw pointer read of any type period. Otherwise it's unclear to me what separate situation the "or" is separating out? Or is this just saying reading so-primitive-they-are-register-sized types out of uninit memory is UB, which includes pointer types themselves?

Looking at the exact wording:

an integer (i*/u*), floating point value (f*), or raw pointer read from uninitialized memory, or uninitialized memory in a str.

I think it's using "read" in the past participle form. That is, it really means:

([an integer (i*/u*), floating point value (f*), or raw pointer] that has been read from uninitialized memory), or (uninitialized memory in a str).

That is, you can't read uninitialized memory as an integer, floating point, or pointer value. This is because all of the bytes in such values must always be initialized.

3 Likes

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.