Can bytes become undef just from casting?

E.g. if I have a &mut [u8] which is pointing to zero initialized memory, and I cast it to a &mut MaybeUninit<T>, but then never write to it, from the point of view of the Rust abstract machine is my original &mut [u8] still valid or would that now be a reference to uninitialized memory?

What if instead I had cast it to &mut T directly were T has padding bytes but still never write anything?

What if instead I had cast it to &mut T directly were T has padding bytes and I write to individual fields but don't copy another T over it? (IIUC if I copy another T over it then the bytes in the &mut [u8] corresponding to the padding bytes become undef and break it, but maybe this is avoided when only assigning individual fields?)

The MaybeUninit type only requires you to initialize your slice before accessing it, and if it originally came from a &mut [T] we can assume the slice was initialized.

As long as you don't forget a field, initializing a struct field-by-field is perfectly fine.

use std::mem::MaybeUninit;

#[derive(Debug)]
struct Point {
    x: f32,
    y: f64,
}

fn main() {
    let mut p: MaybeUninit<Point> = MaybeUninit::uninit();

    unsafe {
        let mut ptr = p.as_mut_ptr();
        (*ptr).x = 1.0;
        (*ptr).y = 2.0;
    }

    let p = unsafe { p.assume_init() };

    println!("{:?}", p);
}

(playground)

Running this under Miri produces no warnings of UB.

Can't this be UB due to creating a place expression with uninitialized value?

1 Like

Yeah. I switched the field type to something more complex and now Miri notices the issue.

error: Undefined Behavior: type validation failed at .pointer.pointer: encountered uninitialized raw pointer
   --> /playground/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/alloc/src/raw_vec.rs:224:9
    |
224 |         self.ptr.as_ptr()
    |         ^^^^^^^^ type validation failed at .pointer.pointer: encountered uninitialized raw pointer
    |
    = help: this indicates a bug in the program: it performed an invalid operation, and caused Undefined Behavior
    = help: see https://doc.rust-lang.org/nightly/reference/behavior-considered-undefined.html for further information
            
    = note: inside `alloc::raw_vec::RawVec::<u8>::ptr` at /playground/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/alloc/src/raw_vec.rs:224:9
    = note: inside `std::vec::Vec::<u8>::as_mut_ptr` at /playground/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/alloc/src/vec/mod.rs:1178:19
    = note: inside `<std::vec::Vec<u8> as std::ops::Drop>::drop` at /playground/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/alloc/src/vec/mod.rs:2888:62
    = note: inside `std::ptr::drop_in_place::<std::vec::Vec<u8>> - shim(Some(std::vec::Vec<u8>))` at /playground/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/ptr/mod.rs:487:1
    = note: inside `std::ptr::drop_in_place::<std::string::String> - shim(Some(std::string::String))` at /playground/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/ptr/mod.rs:487:1
note: inside `main` at src/main.rs:13:9
   --> src/main.rs:13:9
    |
13  |         (*ptr).y = String::new();
    |         ^^^^^^^^

note: some details are omitted, run with `MIRIFLAGS=-Zmiri-backtrace=full` for a verbose backtrace

I guess the theoretically correct way to initialize the field would be with std::ptr::addr_of_mut!((*ptr).x).write(1.0).

However...

... Nobody is ever going to write code like that in practice. It's an objectively terrible language decision and the (*ptr).x = 1.0 syntax is superior in almost every way (direct analogy with setting a field on a &mut T , you write 14 characters that are spaced out versus a chain of 43 characters with no whitespace, no need to introduce *raw T pointers and all that, etc.).

1 Like

The rule is this:

  • To write to an uninitialized Copy field, (*ptr).x = 1.0; is sufficient.
  • To write to an uninitialized !Copy field, addr_of_mut!((*ptr).y).write(String::new()); is necessary.[1]

The difference is that writing to a !Copy field runs the destructor on the existing value. If the type of the field, or any of its recursive subobjects, has a Drop impl, then it will create an uninitialized &mut self reference, which is not allowed.


  1. unless !core::mem::needs_drop::<T>(), where T is the field type ↩ī¸Ž

3 Likes

The problem here isn't that we have an uninitialized place expression, but that place = value effectively runs ptr::drop_in_place(&raw mut place), unless its type is Copy. Since that can call a Drop::drop() impl, which takes a &mut self reference, it must already be initialized with a valid value. The only obvious fix I can see would be to make place expressions track whether they come from a pointer, and only drop the previous value if they don't. But that would result in the opposite confusion of pointer writes causing memory leaks.

2 Likes

Ah, I missed that. However, isn't creating a ref-to-uninitialized still UB in the case of Copy types? That could still be fixed by my suggestion.

Indeed, but I'm not sure how your suggestion would actually help here. The ideal case would be to have two simple forms of assignment, of which one drops the previous value (asserting its validity) and the other ignores it. Currently, the shortest way to express that is with addr_of_mut!():

(*ptr).y = String::new();                    // drop previous value
addr_of_mut!((*ptr).y).write(String::new()); // forget previous value

But the syntax of the latter is long and unintuitive compared to the former. I've seen two proposed alternatives, the former from RFC 2582, and the latter from "Rust's Unsafe Pointer Types Need an Overhaul":

(&raw mut (*ptr).y).write(String::new());
ptr~y.write(String::new());

I don't really have any horse in this race myself, though.

Then this would be the memory leak, I guess?

let mut boxed = Box::new(42);
let ptr = &mut boxed as *mut Box<_>;
*ptr = Box::new(84);
1 Like

Indeed. In fact, it would happen for every heap-allocated object, including Strings, Vecs, and Rcs. So we have a few options:

  1. The current status quo:
    (*ptr).field = value;                    // to drop prior value
    addr_of_mut!((*ptr).field).write(value); // to forget prior value
    
    This is simple for already-initialized fields but long and unintuitive for uninitialized fields.
  2. An inversion of the current status quo, combined with @H2CO3's suggestion not to assert validity for *&mut place:
    *&mut (*ptr).field = value; // to drop prior value
    (*ptr).field = value;       // to forget prior value
    
    This requires a new intuition, of the differences between place expressions derived from pointers and those derived from references and variables. Also, it's very incompatible with the current status quo, but it could possibly be done on an edition boundary. I'm not a huge fan of this in general, since beginners to unsafe Rust can already easily get confused over the requirements of places vs. references derived from places.
  3. The current status quo, but with simpler syntax for forgetting the prior value:
    (*ptr).field = value;   // to drop prior value
    ptr~field.write(value); // to forget prior value
    /* or any similar syntax */
    
    This seems like the most attractive option to me; it allows users to write into several fields with the same syntax without their code becoming cluttered with addr_of_mut!(). The big downside is the amount of bikeshedding necessary before any new syntax can become a reality.

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.