To nitpick, Vec
's element containing len
initialised elements is a safety invariant and not a validity invariant*, meaning there is no insta-UB when doing that call, even though the Vec
is now holding extra uninitialized elements.
This becomes an issue if this unsafe-but-"valid" state gets to be witnessed by any code outside the one in scope. There are mainly three places where this can occur:
-
if the state is not fixed before the function returns; this is the case of the code shown in the OP, it will thus very probably lead to UB triggering on the caller's code (e.g., when dropping the Vec
if ptr::needs_drop::<T>()
).
-
If arbitrary code is called before the function "returns", e.g., if invoking a given closure, or some trait method of T
. This can be sneaky since it can happen on a Drop
, which in turn can be called early if a panic!
occurs. It is not the case in the code shown in the OP (it would have been, for instance, if there was a println!
before the ptr::write
call! Or if instead of index
, the expression index - 1 + 1
had been used, since there could have been a panic!
for index = 0
on underflow), since get_unchecked_mut
(with a valid index) and ptr::write
cannot panic!
.
-
the very code of the function leads to witnessing an invalid state. This is the case of the code shown in the OP, when they get a &mut T
at vec.get_unchecked_mut(index)
knowing that the pointee has not been initialized yet. As @cuviper mentioned, these rules may get loosened a bit in the future (specially when T
is an integer or floating-point type), but until this loosening happens (if ever), this is Undefined Behavior in current Rust.
* Safety vs. Validity invariant
This is best seen with an example. Take the following code:
use ::core::mem;
fn sound (s: &mut str)
{
// No matter the input, the function is sound to call
// => no need to mark the function `unsafe`
unsafe {
match s.as_bytes_mut().last_mut() {
| Some(at_last_byte) => {
let prev = mem::replace(at_last_byte, 0xa0);
// Danger zone: s points to non-UTF8 bytes
*at_last_byte = prev;
// Good, we have re-established the invariant
},
| _ => {},
}
}
}
unsafe // "may" be UB to call => functions needs to be unsafe
fn unsound (mut s: &str)
{
let prev = mem::replace(&mut s, mem::zeroed());
s = prev;
}
Now, both functions do the "same" thing: they break an invariant of the type &str
, and then restore it back before anything else happens. The issue lies with the kind of invariants that we are dealing with:
-
a &str
(and more generally, &T
for any type T
), is a Rust reference, which has to be non-null. The compiler may rely on this invariant (and it does!).
⟹ It is a validity invariant.
-
a &str
is a reference to a slice of bytes (i.e., a &[u8]
), with the added property that these bytes must form a valid UTF-8 sequence. Library / programmer code may rely on this invariant, but not the compiler (e.g., the invariant is too complex for the compiler to exploit it).
⟹ It is a safety invariant.
The function unsound
breaks the non-null validity invariant, thus instantaneously triggering UB, so by the time the null-pointer is erased it is too late: validity invariants must never be broken, not even for an instant!
The function sound
breaks the "valid UTF-8" safety invariant, but fixes it right away before any code or anything (besides the compiler) witness this state. And since the compiler does not assume anything regarding the UTF-8 validity, no harm is done: safety invariants can be temporarily broken so as long as only code in scope deals with the broken state, provided this code does not trigger UB in so doing.
Back to Vec
, one can see Vec
as:
/// # Safety invariant:
///
/// - `raw_buffer[.. len]` holds valid values of type `T`
pub
struct Vec<T> {
raw_buffer: Box<[MaybeUninit<T>]> // ptr, capacity
len: usize,
}
impl<T> Vec<T> {
/// # Safety
///
/// - the safety invariant must not be broken
pub
unsafe
fn set_len (self: &'_ mut Self, len: usize)
{
self.len = len;
}
pub
fn as_mut_slice (self: &'_ mut Self)
-> &'_ mut [T]
{
unsafe {
// # Safety
//
// - the safety invariant guarantees that this is sound
::core::slice::from_raw_parts_mut(
self.raw_buffer
.as_mut_ptr() // : *mut MaybeUninit<T>
as *mut T
,
self.len,
)
}
}
}
impl<T> Drop for Vec<T> {
fn drop (self: &'_ mut Self)
{
unsafe {
// Safety: from safety invariant: we own at least `len` elements
::core::ptr::drop_in_place(self.as_mut_slice());
}
}
}
So, as you can see, calling .set_len()
does not break anything w.r.t. compiler assumptions, as we are just modifying a usize
field of some struct. We are however, breaking the safety invariant of the Vec
type, which is used to soundly perform other unsafe
operations, such as with as_mut_slice
casting/transmuting a slice of MaybeUninit<T>
to a slice of T
, by using and trusting the value of its len
field.
In the example of the OP, as_mut_slice
was called by the DerefMut
impl of Vec
, which was implicitly called by .
(dot operator) auto-deref. So the funny thing is, no matter what the argument of that .get_unchecked_mut
call was (that is, it could have been some value < old_len
), it would still have been UB due to temporarily creating a &mut [T]
of len index + 1
, thus containing uninitialised T
s in it, which (currently) is a violation of a validity invariant and thus instant UB.