Is this correct way to write uninitialized bytes?

I am not sure if this code is safe...

use std::io::Read;

fn read_data(buf: &mut Vec<u8>) {
    let data = "Hello, World!";
    let mut reader = std::io::Cursor::new(data);
    unsafe {
        buf.reserve(data.len());
        let len = buf.len();
        let uninit = std::slice::from_raw_parts_mut(
            buf.as_mut_ptr().add(len),
            data.len()
        );
        reader.read_exact(uninit).unwrap();
        buf.set_len(len + data.len());
    }
}

Is it safe ? No ?

Yes, but there's an important caveat: This depends on implementation details of reader! If you are writing a library and accept anything that implements Read, then you are not allowed to do this. It's only ok if you know that reader is well behaved. (e.g. it would be ok with a TcpStream or File, but not an arbitrary impl Read)

To elaborate, a reader that isn't well behaved could read from the provided buffer, which causes UB when it is uninitialized. Obviously, no well behaved reader will ever read from the buffer, but creating a reader that does this does not require unsafe, and if your program has UB and only a single unsafe block, then that unsafe block is at fault, no matter how silly the safe code is.

Another caveat: Having an uninitialized u8 is definitely not ok. Based on this, there has been a lot of discussion on whether an &mut [u8] is allowed to exist at all when the bytes behind it are uninitialized. However, it seems like the current consensus is that having an &mut [T] pointing at memory that's invalid for T is ok as long as you don't read it.

5 Likes

Would that mean that the code is, strictly speaking, unsound, unless std gave corresponding guarantees?

Since it is an implementation detail, you can make that argument. However, I would claim that not making such a change to TcpStream in a future release of Rust is a de-facto promise, even if it's not explicitly mentioned in the standard library.

1 Like

This could in principle be worked around by writing a sealed trait SealedRead: Read that is only implemented for types known to have a well-behaved Read impl.
That effectively offers a type-level filter on which types are allowed, by using as a parameter r: impl SealedRead rather than r: impl Read.

That wouldn't be a 100% foolproof solution, since a previously working Read impl could be changed (perhaps optimized) in a way that breaks such an assumption. But for readers in particular it seems unlikely that optimizations would involve reading the byte buffer.

BorrowedBuf aims to solve the problem from the other end by having the caller specify the state of the buffer they pass

2 Likes

No, it's not safe. You're taking uninitialized memory and basically transmuting it into a &mut [u8]. This is UB. Uninitialized memory isn't just "filled with random data", it doesn't have any value at all. You may think of it as if each bit had 3 states: true, false, and uninit. A real u8 is composed only of ordinary two-value bits, and so is a &mut [u8], but an uninitialized buffer has all its bits set to "uninit". For details, see these posts [1] [2] by Ralf Jung.

There is no safe way to do what you want, because the Read trait doesn't allow using uninitialized buffers. All buffers accepted by its methods are of the form &mut [u8], which implies that the entire buffer is always fully initialized, whatever its value is. If you want to deal with uninitialized memory, you should use the MaybeUninit type and read/write data only using raw pointers (*const T/*mut T). You should never, at any point in the program, have a safe reference (&/&mut) to uninitialized data! (unless it's something like &MaybeUninit<T>, which explicitly allows the data to be uninitialized).

This has no relation to the safety of this code. You are producing an invalid value of the type (a slice &mut [u8] filled with uninit bytes), and that is immediate language-level UB, regardless of the way you may be using it. For example, the compiler is free to dereference any &T, because a safe reference guarantees that its contents are safe to read. That causes a read of uninitialized memory, with obviously drastic consequences.

6 Likes

No, this is UB. You can't use slice::from_raw_parts_mut with uninitialized data.

The correct and safe solution is:

buf.clear();
reader.by_ref().take(len).read_to_end(&mut buf)?;
assert_eq!(len, buf.len());
6 Likes

This concurs with the documentation of std::slice::from_raw_parts_mut:

pub unsafe fn from_raw_parts_mut<'a, T>(data: *mut T, len: usize) -> &'a mut [T] ⓘ

Safety

Behavior is undefined if any of the following conditions are violated:

  • […]

  • data must point to len consecutive properly initialized values of type T.

  • […]

One comment regarding terminology. The correct question(s) would IMHO be:

  1. Is it sound?
  2. Does it exhibit undefined behavior (UB)?

The Rust reference defines "unsound" as follows: "if unsafe code can be misused by safe code to exhibit undefined behavior, it is unsound."

But following this thread, the code is not just unsound (because safe code in the standard library could read from the mutable reference) but also immediately exhibits UB (even if the standard library is not changed).

Edit: Not sure if the latter is really true, see follow-ups by @alice.

If you ask "is it unsafe?" then the answer would be:

Yes, it's unsafe because you use unsafe. :laughing:

(I.e. it's always dangerous ("unsafe") to use unsafe.)

The current consensus, as outlined in the unsafe code guidelines repository here, is that the requirement that a &mut T must point to a valid T should be a safety invariant, rather than a validity invariant (see here for terminology). Being a safety invariant, creating a &mut T to an invalid T is not immediate language-level UB.

One of the motivations behind this is precisely to allow code like the one in the original post.

My link from earlier discusses this, but currently references are only annotated with dereferencable(N) rather than dereferencable(noundef N). That makes it is immediate UB to have a reference that is dangling, but not immediate UB to have a reference pointing at uninitialized memory.

4 Likes

Does that mean the documentation of std::slice::from_raw_parts_mut is slightly outdated (or rather: conservative)?

It doesn't distinguish between safety and validity invariants, which I suppose is somewhat conservative.

It's definitely not outdated as, whatever the current direction of the UCG happens to be, there has been no formal decision. Some might call it conservative if they're confident enough in the current direction of the UCG, but I would personally consider it irresponsible until there has been a formal decision.

Until the decision is formal, it is still undefined.

3 Likes

Ah, so the answer to the question whether the code is exhibiting undefined behavior is… :thinking: well… undefined. :laughing:

3 Likes

API documentation is only ever about safety invariants. Validity invariants are only relevant when interacting directly with language primitives. Most of the time, if you violate an API's "safety" documentation you are not guaranteed to cause language UB. It's a modularity thing. std::slice::from_raw_parts_mut says what the requirements are to produce a &[T] that can be passed to arbitrary safe code.

It is a separate question whether some functions like std::slice::from_raw_parts_mut want to also have promises regarding when they will not cause immediate UB. That should be a very rare thing generally speaking, because you basically need to have complete knowledge of the MIR generated by the function and everything it calls (or at least validity promises of the same nature for all of those functions). Furthermore having such documentation on the API could mislead users, if they confuse the safety and validity documentation. It is probably best to relegate those kind of docs to the nomicon or so.