Unitialized memory safety question

My potentially simplistic understanding of safety re references was that memory needs to be initialized before you can even create a reference to it and that these even applies to primitives where there are no invalid bit patterns. However in playing around with simple examples in rust playground to try to make minimal examples of some internal code, I came up with this example which I expected miri to complain about but it doesn't, seemingly because we write to the memory before reading from it. Is this code UB or not?

fn main() {
    let n = 4;
    let mut data: Vec<u16> = Vec::with_capacity(n);
    let slice = unsafe {
        core::slice::from_raw_parts_mut(
            data.as_mut_ptr(),
            data.capacity()
        )
    };
    for i in 0..n {
        slice[i] = 0;
    }
    dbg!(slice);
}

From the documentation of from_raw_parts_mut in std::slice - Rust.

data must be valid for both reads and writes

So your code is UB.

That is what I thought as well. But I'm curious if someone could confirm if the Miri behavior then is just a false negative.

Also, if you click through to that "valid" link though, I don't see any mention of initialization as a requirement for validity of a pointer.

The docs do say the following, which is why i assume that my example is UB. But it feels like Miri should be able to detect that then (knowing very little about Miri internals)

data must point to len consecutive properly initialized values of type T.

So I was wondering if initialized in the docs here was being used with the same semantics as memory initialization.

I can't say about this case, specifically, but Miri does generally have false negatives. Or rather, UB it's not yet looking for. That means that the lack of reported UB will not imply a lack of UB.

1 Like

I think one question is: do you ever intend to use this code where:

  • n != 4
  • T != u16
  • overwrites != 0
  • compiler version / optimizations differs from current Rust playground ?

Because all what miri has shown is that with this particular configuration, nothing has been detected ; but it says nothing about if n = 5, or if T = u32, or if you overwrite with 1, or if compiler version is bumped or if compiler opt level changes -- because in such a case, the compiler may or may not do optimizations rewrites that depends on

which your code has violated.

Yea, I'm aware that this is just one specific example. And I expect that this is in fact UB. But that's exactly why I'm asking the question to clarify my understanding. Ie is it in fact okay to make a reference to uninitialized memory for types such as a u8 or u16 which have no padding and there are no invalid bit patterns if we always fully write the data behind the reference before reading?

No, it's not okay. From compiler point of view creating a reference is fully equivalent to reading data, even if no reads is actually done.

2 Likes

I think the documentation is pretty clear - it is NOT okay. Imagine a machine with "9 bit bytes", or perhaps machines with an error-checking parity bit for each byte. On such a machine, there could be invalid bit patterns for a u8. Hope this helps!

[ A different question would be WHY it is UB, and that would involve questions involving compiler optimisations which I do not know the answer to ]

Here’s an up to date summary…

The status of reference to uninit memory is undecided. We document them as UB in the reference so that we can make this decision without code already relying on an outcome. Miri does not flag this UB because we are not sure if we really want to rule out all that code. The compiler does not actually make them UB and the standard library can rely on that, but user code cannot.

Also see discussion in

[…]

My own position is that this should not be UB, expect when the reference points to an uninhabited type -- &! should itself be considered uninhabited.

4 Likes

My intuition is that this is also safe and is slightly different from the link you've shared.

In this instance, the question is not exactly about creating references to uninitialized memory but whether passing a raw pointer to from_raw_parts_mut which points to memory that may never have been written to is valid or not.

We are not creating the rust reference, the call to from_raw_parts_mut is and my assumption is that a raw pointer passed to this function is always assumed to be "initialized" in the compiler's point of view (at compile time).

The documentation specifies :

`data` must point to `len` consecutive properly initialized values of type `T`

Which i believe is a mis-use of the word initialized in the context of compile time and UB. To me, it sounds like it should be read more like "The values/contents must be valid T bit patterns"

1 Like

What do you exactly mean by this? The pointer itself is a value of type *mut T which is guaranteed to be initialized. You cannot create an uininitialized value in valid rust.

The pointer must point to initialized memory with valid bit patterns. Why are you refering to compile time?

I’m not entirely sure what “also” refers to, but the point I quoted was that it’s not safe for user code, but undecided.

I don’t follow this distinction. If anything, referring to the fact that we don’t directly create the reference to uninitialized memory, but let from_raw_parts_mut do it, makes a stronger argument that it should be undefined behavior if the documented safety invariants are violated.

Make sure to be clear what you mean by “assumed […] at compile time”. If something is assumed at compile time but not actually true on the abstract machine at run-time, that’s the very definition of what undefined behavior is.

For most types in Rust, uninitialized memory does not constitute a valid bit-pattern of a value T. For some types it does, like the type MaybeUninit<u8>. but for a type like u8, even though every initialized bit pattern is valid, uninitalized memory is still invalid.

The argument that you might be trying to make here is that uninitialized memory doesn’t exist at run-time, so there’s never harm in convincing the compiler to assume that some memory was initialized at compile-time. However this is wrong – correctness of your program must always argue about the abstract machine, not the concrete lowering into x86 assembly, for instance. For uninitialized memory, IIRC there’s even some OS-related effects (about conditionally released pages or something like that) that can make uninitialized memory behave weirdly in practice in the absence of compiler optimizations. But that example shouldn’t be relevant in the first place, anyways; the argument that’s always relevant for undefined behavior, that is, the argument of ensuring existing or future compiler optimizations will work correctly, should be sufficient.

Feel free to also read this article on uninitialized memory if you haven’t already. (The author is the same person that posted the summary I’d quoted above):

1 Like

Thanks for the link @steffahn. That explains what I was seeing with Miri. Much appreciated.

This is a complicated and not-yet-fully-resolved question.

You probably want to read https://www.ralfj.de/blog/2018/08/22/two-kinds-of-invariants.html

It's definitely a violation of the safety invariant for a reference to point to uninitialized memory, because other safe code to which you pass the reference is allowed to read it. But that's a soundness problem, not an immediate-UB issue, which is why MIRI doesn't find it.

It's still unclear whether the existence of a reference to invalid bits will be insta-UB or not. As I understand it, the leaning is towards "no".

2 Likes

My bad, I mis-read your comment.

True, I'm specifically only talking about types which are Pod / AnyBitPattern style. Should've mentioned that ! I do agree that the contents of the memory must be valid for T for it to be sound.

Basically, what I was trying to say is that I cant imagine how it would be possible for undefined behavior to arise from such a use of from_raw_parts_mut. In the compiler's eye, we are passing an arbitrary raw pointer (maybe comes from C FFI and has never been written to, might be paged out by the OS, might not be truly allocated and trigger a page fault on first read/write, etc...) and converting it to a Rust reference. There is no way to know at compile time whether this raw pointer has been written to once or not which means the Rust reference has no other option but to be considered a reference to initialized data no ?

1 Like

This sounds as if you’re assuming that raw pointers live outside of Rust’s abstract machine model. If that’s the case, that’s the gap in your argument. The compiler can and does reason about what a raw pointer points to for optimization purposes. Especially as in this case when the pointer comes from the standard global Rust allocator, where the compiler may very well reason about the fact that pointers that come from the allocator always start out pointing to uninitialized memory.


Edit: Or maybe the argument is that because external raw pointers are possible the function core::slice::from_raw_parts_mut cannot possibly be optimized in ways that break if the pointer pointed to uninitialized memory. That doesn’t work either, because the function that calls core::slice::from_raw_parts_mut doesn’t get translated independently from the implementation of
core::slice::from_raw_parts_mut itself.

E.g. it doesn’t suffice to look at the assembly of core::slice::from_raw_parts_mut, determine that there was no misoptimization in there, and thus ignore any UB it might have executed because the assembly is correct. For instance, the call might be inlined to the caller; the compiler might have annotated the ABI of the function in ways that causes callers to do optimizations that are only valid if the pointer passed to it points to initialized memory, etc, etc…

As far as I’m aware, the only argument of using inspection of assembly as an argument to justify acceptable UB is as follows: If you’re someone creating a binary (application or dynamic library, etc…) and inspect the assembly of the whole binary after compilation to make sure it actually does what it’s supposed to be doing, then there’s no harm in using that binary. There’s still harm in re-compiling the code (especially with a different compiler version) unless you re-audit the whole binary (like, literally the whole thing, including parts that don’t seem related to the functions where UB “happen”).


(continuing the first section before where the Edit: starts)

(Even when the pointer comes from C via FFI, I believe that LTO between Rust and C might be possible in ways that enables LLVM to reason about pointers across the boundary, though I’m not 100% sure on that.)

One place where such an argument might work perhaps is inline assembly that reads from uninitialized memory. That might be something that’s really outside of the compiler’s abstract machine.

2 Likes

Interesting... Thanks for the insights :pray:

I'm slightly mind blown that the compiler can go as far as tracking whether certain arbitrary raw pointers have been written to or not. I guess it is theoretically possible to track that a raw pointer is derived from a variable and also track that certain byte offsets into that variable have been written to or not... I always imagined raw pointers to be blackbox/opaque to the compiler but I cant find anywhere that explicitly says this !

1 Like

Good to hear you find my answers useful.

This is not an uncommon (mis)interpretation. Ralf Jung also has 3 interesting blog posts about pointers, you might want to take a look there, too :wink:

5 Likes

Note that Vec::spare_capacity_mut exists to access the uninitialized tail of a Vec.

1 Like

Tracking arbitrary raw pointers quickly runs into the Halting Problem, and is thus infeasible. But tracking some raw pointers is definitely feasible, and LLVM routinely does it. After all, Rust is basically the only language which makes a hard distinction between safe references and raw pointers. C++ references are somewhere in the middle, and C has only raw pointers. If you want to optimize C code (and you definitely want that), then you must implement some raw pointer tracking.

In particular, allocating fresh memory and doing something with it is definitely within the range of compiler capabilities. The compiler will know if you try to read uninitialized freshly allocated memory! Whether it does something with that knowledge is a separate concern.

Also, uninitialized memory isn't just a compiler notion, it actually exists at runtime. OS tracks whether you have written to a page of memory, and can return inconsistent, or zeroed pages if you try to read from a page before writing to it. Hardware also does some write tracking. It must, in order to ensure consistency of writes to the same memory from different processing units! This means that it can behave in unpredictable ways if you try to read from memory which wasn't written to. Even the RAM can behave inconsistently: for power efficiency reasons segments which were not written to may be not refreshed, thus causing their contents to behave erratically over time.

5 Likes