Why deference MaybeUninit().unint().as_mut_ptr() is safe?

It is source code of std;
Demonstrating how to init a struct field constructed by MaybeUnint

let mut uninit = MaybeUninit::<Demo>::uninit();
// `&uninit.as_mut().field` would create a reference to an uninitialized `bool`,
// and thus be Undefined Behavior!
let f1_ptr = unsafe { ptr::addr_of_mut!((*uninit.as_mut_ptr()).field) };
unsafe { f1_ptr.write(true); }
let init = unsafe { uninit.assume_init() };

Ah, I see. It's safe because they don't actually read any uninitialized data anywhere.

How could I know it does not read? though in technology it needn't.

The ptr::addr_of_mut! macro guarantees it.

If it does not really read, why addr_of_mut doe not implemented as

&raw mut $place  ========>  &'mut $place as *mut _

In technology, such a reference never reads the referent. But it is not safe.

Creating a mutable reference to an invalid value (e.g. initialized memory) is UB, which your suggestion would do. The &raw mut operator doesn't do this.

It is said to be UB, but why?

The language is free to say that something is UB if it wants to. It doesn't need a reason.

That said, you may find this helpful.

Yes. It is free to declare something is UB.
What I am interesting with is that: is there in fact something not really UB among those UB conditions or there indeed are something happens, and does, what is it.

Things like "you are not to do this, it is dangerous, but I wont tell you anything" only makes me afraid.

The issue here may be something about the syntax of the macro being potentially confusing

  • I suggested at the time that another syntax be used, precisely to avoid this:

    ptr::raw_op!( &raw mut (*uninit.as_mut_ptr()).field )
    

I imagine such a syntax would be fine by you, @zylthinking, since you seem to be fine with the declared UB pattern of &mut (*uninit.as_mut_ptr()).field as *mut _.

Now, it just so happens that the true macro(s) will insert the &raw operators themselves:

ptr::addr_of_mut!( <place> ) ≝ &raw mut <place>


Additionally, see the definition of the * operator in Rust (emphasis mine):

  • Source

  • Emphasis on "denoting a location", which in Rust is called a place: something that you can take the address of.

As you can see * just denotes a location, it does not necessarily trigger UB on its own. It triggers UB on certain instances (again, emphasis mine):


From both these snippets we deduce that dereferencing a pointer that dangles (is null or points to a non-empty span of bytes contained in no memory allocations) is indeed UB, even without reading or writing to that place.

But, if the memory does match a valid memory allocation, such as the one from a local variable (the MaybeUninit in your example), then we dodge that UB. The second source of UB would be to "read uninit memory", which is not UB per se. It is only UB:

  • If the read actually happens,

  • so as to produce a invalid value, i.e., a value for a type for whom uninitialized bytes are not a valid "bit-pattern". And the only case where uninitialized bytes are not forbidden is for padding bytes: so for almost all the non-zero-sized types in Rust, reading uninit memory for that type is indeed UB.

    • But do note that there are some types that can be filled with padding bytes (let's call them uninit-compatible types):

      • zero-sized types;

      • #[repr(C)] enums unions having at least one variant of uninit-compatible type, such as MaybeUninit<T>;

        • EDIT: damn, it's been 9 days that I had typo'd here enum instead of union, and nobody noticed? :smile:
      • structural compositions of such (i.e., structs, tuples, tuple structs whose fields all are of uninit-compatible types, as well as arrays of such).

But in the OP's case, this doesn't even matter! Indeed, no value is produced whatsoever when referring to that place (except the value of the address of that place).


Well, for a formal model to be fully usable, it needs to be fully compositional. This is better seen with counter-examples / examples of what not to do. The wonderful C language has generously donated many such examples :grin::

  1. Given:

    // struct Foo { a: u8, b: u16 }
    typedef struct { uint8_t a; uint16_t b; } Foo;
    
  2. Then, the following is fine:

    size_t offset_of_b = &( // compute the address of:
        ((Foo *) NULL) // the `NULL` pointer to a `Foo`
        -> b // "dereferenced" to designate the place of its `b` fiedld
    );
    
  3. But the following is not:

    Foo * null_foo = NULL;
    size_t offset_of_b = &null_foo->b;
    

Indeed, the C rules only allow to dereference NULL literally if we are to take its address right in the same expression. The moment any intermediate layer such as a function call or a local binding happens, we are technically breaking the C rules, and get UB.

Rust, in that regard, is way more consistent: while it could have been acceptable to declare &mut <place> as *mut _ non-UB, and define it as being a way to express &raw mut <place>, so doing would have been very fragile, error-prone, and thus, foot-gunny.

Indeed,

  • { &mut <place> } as *mut _

  • identity(&mut <place>) as *mut _

  • { let p = &mut <place>; p as *mut _ },

would all still be UB, and we'd be in no better position than C!

So, either all these &mut <place> are UB when <place> is invalid, or none are; but with no in-betweens since that's inconsistent / footgunny.

Now, you may still want to know why Rust chose to have some of the &mut <place> be UB (which thus makes all UB for consistency / compositionability)?

Here is the main example:

fn fun<T> (p: &mut T, n: usize)
where
    T : Copy, // e.g., `T = i32` if you want
{
    for i in 0 .. n {
        let value = *p;
        stuff(value);
    }
}

This is a typical situation where the compiler may prefer to hoist the read-dereference of p before the loop body, so as to avoid performing too many reads:

// Compiler optimization
fn fun<T> (p: &mut T, n: usize)
where
    T : Copy, // e.g., `T = i32` if you want
{
+   let value = *p;
    for i in 0 .. n {
-       let value = *p;
        stuff(value);
    }
}

This is a valid transformation independently of the guarantees of &mut T when n > 0. But, alas, the n = 0 case would suffice to make this optimization illegal, unless &mut T had the validity invariants it currently has: by virtue of being an unaliased and always dereferenceable reference to a valid instance of type T, the compiler is allowed to "spuriously dereference it" if it so wishes. And it definitely wishes to do so in this instance!

From all this, the good mental model with Rust references is:

The moment you produce a Rust reference and up until the point it is last used (the end of a function's body if it is a function's parameter), the compiler may spuriously read (and for &mut, even write!)-dereference such references at any point.

So, if we are back at your &mut <place> as *mut _, since this is not an "atomic" / leaf compiler operation, the compiler is allowed to transform it into

{
    let p = &mut <place>;
    let value = *p; // ill-defined if `<place>` contains an invalid value
    p as *mut _
}
8 Likes

I think you are making a common mistake about the nature of UB. Undefined behavior does not have to be a segfault or a data race or anything that "really" happens. If it did, you could detect it in a debugger or probe it with a logic analyzer. Undefined behavior is a gap in the specification of the language, where you can write code that follows all the syntactic rules, but doesn't correspond to any meaningful semantic interpretation. Since there's no meaningful interpretation, the compiler may assume you meant anything. It may happen to generate code that offsets the pointer in the way you expect. But it also may not. And it might work today, but stop working on Tuesday, or whatever.

Undefined behavior doesn't really happen in an executing program, it "happens" during translation between abstraction layers. Rust has undefined behavior with respect to LLVM, and LLVM has undefined behavior with respect to machine code, and machine code even has undefined behavior with respect to hardware. And hardware itself has undefined behavior with respect to, well, physics. These abstraction layers intentionally have gaps to allow freedom of implementation. Ultimately all behavior is "defined" somewhere along the line, in the sense that digital computers are deterministic machines and when you run the program something is going to happen. But that's not a particularly useful way of analyzing programs, so when you're talking about code written in Rust, the only thing that really matters is whether Rust defines the behavior or not.

Things like "you are not to do this, it is dangerous, but I wont tell you anything" only makes me afraid.

Yes, the undefinedness of undefined behavior is exactly what makes it the most fearsome kind of error.

3 Likes

What I am saying is the UB should be limited.
Not all UB are UB.

All UB is not UB when it does not take into effect.
Use as type differs Use as value.

A bool with value 3 is UB only use as value.
sizeof::<bool> is always 1 even it contains a 3.

A reference which references uninitialized memory should not be UB when it is used as a type.
&(*uninit.as_mut_ptr()).field indeed is used as type.

It should not be considered as an UB to introduce a macro to keep it safe

This makes no sense, how can UB not be UB? It's undefined behaviour, thus is undefined behaviour.
Maybe you meant that "undefined behaviour" (which is what UB commonly means) is not always "unexpected behaviour"? You may be right, but the language/compiler is free to make it "unexpected behaviour" whenever it wants, or maybe it already does but you haven't realized yet.

Even if it does what you expect it to do it's still UB. UB doesn't mean miscompilation, crashes, and things like that, it just means that what happens is not defined

How can a value be used as type and viceversa? Maybe you mean as a place?

Except it is UB, because it creates a reference, and that implies the value is initialized. Knowing that a value is initialized while it is actually not initialized can actually lead to unexpected behaviour in practice, even if the uninitialized value is never explicitly read. Take for example this code where you can see the compiler optimized the entire function to a single ret, thus returning uninitialized data and never correctly setting the foo field.

2 Likes

Or maybe the function just does nothing because mem::size_of::<Baz>() == 0? :wink:

Also, you’re just assigning to (*uninit.as_mut_ptr()).foo here, not creating a reference to it, and Foo doesn’t implement Drop, so I’d question if there’s any UB in your code in the first place, even if Foo had more than one variant or was #[repr(u8)] or similar.

Ugh, I guess I should have added a repr(C) or something like that to the enum.

The Rust language was designed by experts, and its UBs were chosen carefully.

If you think you can do better, well, you can think whatever you want. But I doubt you will be able to.

But I assure you that every instance of UB in Rust was checked many times before it was decided that the optimization opportunities it provides worth the additional complexity.

This is not a nice way present your idea.


As for doing better, there are a large number of instances where the choice between something being UB or not UB is more a natural outcome of the language design or its underlying implementation, rather than a question of doing better.

As the tradeoffs in software engineering change over time, so does the ideal solution. Some 40 years ago when the first C standards were written down, by people no less competent than those that work on Rust today, the design of the language and the list of behaviours not defined likely made much more sense in context of back then than they do right now. It is not all that unlikely that some years down the line the choices made by Rust won't make all that much of sense as they do today, too.

7 Likes

Excellent. Thanks for the deep detail.

1 Like

I don't understand the last phase.

Surely &mut as *mut _ can be translated to

let p = &mut <place>;
let value = *p; // ill-defined if `<place>` contains an invalid value
p as *mut _

Which is possible in principle but rare, for no need for reading to value in an expression like &mut <place> as *mut _

It is not the case however, the point is: why such a translation will cause UB?

    let p = &mut <place>;
    let value = *p; // ill-defined if `<place>` contains an invalid value

May value be invalid when and only when p has broken the reference guarantee.

That is not because of p as *mut _

OK, I understand at last.
It is because the place is from MaybeUninit, and definitely let p = &mut <place>; will break the reference guarantee.

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.