Is it possible to read uninitialized memory without invoking UB?

alice · August 21, 2021, 5:14am

It's pretty unambiguous that the snippet does contain UB if the extern method doesn't initialize it. That it is "unoptimizable" doesn't really matter. UB allows the compiler to not optimize it.

In general having a spec say that something is unoptimizable is ... not great. The spec should simply specify how the program is guaranteed to behave, and there are much much better ways of specifying that than saying that it is unoptimizable.

newpavlov · August 21, 2021, 5:06pm

It means that compiler must call this extern fn and can not assume anything about pointers which were passed to or returned from it. Compiler can not inline this call or look into how this function is implemented. In other words, extern fns act as a black box functions for compiler. AFAIK it's exactly how they are handled today, but this behavior is not set in stone in the language spec (which I think it should).

I am not sure why you use this example. There are far more common examples when the usual assumptions about memory can break. For example, memory which represents hardware state (breaks the first assumption) common in embedded or memory aliased with other threads/processes/kernel (breaks the second and third assumptions). Well, effectively all 3 assumptions are about aliasing. MADV_FREE simply represents the case when memory becomes aliased with kernel.

But in my snippet none of those assumptions is broken!

freeze is an abstract machine level function. In real hardware it is noop. freezeing memory which is aliased with kernel or other processes will not magically solve the broken assumptions. My point is that extern fns can act as freeze, albeit less effective (since compiler has to keep the call).

Treating extern fn as a black box is a perfectly reasonable behavior for a spec in my opinion. Yes, it may prevent some optimizations, but how is it different from #[inline(never)]? We use this attribute to specifically disable one of compiler optimizations, thus making such function "unoptimizable".

alice · August 21, 2021, 8:26pm

It's bad because it's so incredibly vague. The spec should say what behavior it guarantees in which situations instead of trying to implicitly control the resulting executable by talking about the algorithm with which it generates it.

As for #[inline(never)], there's a big difference between saying that one specific optimization shouldn't happen and saying that no optimizations may happen. The former is reasonable enough, but the latter is not really. (This is ignoring that, despite its name, #[inline(never)] is just a lint and that it doesn't actually guarantee that no inlining happens.)

CAD97 · August 21, 2021, 9:12pm

My point is that this isn't true!

The following is just a rough sketch, but the point is that using undefined values is always UB. It doesn't matter what the physical machine does, because the optimizer is working on the abstract machine semantics.

Consider the following program snippet. We'll be performing odd transformations that aren't clear optimizations to make it extra clear, but the same issues apply with actual optimization passes:

let _10: BigArray = extern_fn();
let _50: bool = pure_fn(&_10);
// lots of code ...
if _50 {
    // lots of code ...
}
// lots of code ...
if _50 {
    // lots of code ...
}

Importantly, BigArray is too big to pass in registers, so at an ABI level might look like:

let _00: *const BigArray = alloc::<BigArray>();
let () = extern_fn(_00);
let _50: bool = pure_fn(_00);
// ...

You might complain that obviously BigArray should be on the stack, not the heap. My compiler says no, it's too big, it goes on the heap. You can't observe the difference anyway.

I hope you can see the issue at this point when extern_fn doesn't write anything to its return value, but let's make it even more painfully obvious:

let _00: *const BigArray = alloc::<BigArray>();
let () = extern_fn(_00);
// lots of code ...
let _50: bool = pure_fn(_00);
if _50 {
    // lots of code ...
}
// lots of code ...
let _51: bool = pure_fn(_00);
if _51 {
    // lots of code ...
}

After all pure_fn is pure, so we can call it twice and get the same result! Except if we've never written anything to the pointer (the memory behind it is uninitialized), we can read a different value, and end up taking only one of the two branches.

And if extern_fn returns a heap pointer directly, we have to do even less work to get UB: just read it twice, no need to think up an excuse to heal allocate.

I reference MADV_FREE because it's an easy example of two reads of uninitialized memory returning different values on a physical machine. The abstract machine (the optimizer) assumes that any value you use is not uninitialized, i.e. is a consistent value. If it's not, it's trivial to cause a contradiction via redundant reads.

The way that freeze translates into an actual machine operation is that it forces there to only be a single read. Any further use of the value must be by the frozen (consistent) value already read, where without the freeze the optimizer is free to eliminate the redundant store and just read the source (undefined) value more than once.

newpavlov · August 22, 2021, 1:32am

@CAD97

You examples boil down to the following hypothesis: on hardware level (i.e. we are not talking about abstract machine anymore) subsequent reads of unwritten memory exclusively owned by a thread may produce different values. AFAIK it's not true, so your examples are perfectly fine and do not contain UB. Can you produce a program which would confirm your hypothesis?

Note that I consider MADV_FREE being the case of sharing page ownership with the kernel, so reading such page does not fulfill the exclusive ownership condition.

In other words, this abstract machine defines a set of assumptions guided by which optimizer can transform code. When we have a mismatch between those assumptions and reality (or code itself breaks some of those assumptions), we enter the dragons territory, since code transformations may no longer be valid. This is what we commonly call UB.

My point is that the touch_pointer function removes the undef property from an affected pointer (assuming that extern fns always work as a black box), thus compiler can no longer apply optimizations dependent on this flag.

Now the question is whether an allocated (i.e. exclusively owned), but unwritten memory, always produces the same result on reads if no values have been written into it (thus the hypothesis earlier). It's no longer a question of abstract machine, but about properties of a target system. I can imagine a contrived system which returns 0x00 on reading from physically unmapped pages and garbage stored in RAM if page has been physically mapped. In other words, writing one byte into such page can change values for all bytes in it. On such system my and your snippets would indeed contain UB, but AFAIK real systems do not behave like that.

What? Do you seriously claim that for the following code:

let buf: &mut [u8; 1 << 30] = freeze(uninit_alloc_1gb());
loop {
    let n = get_random(buf.len() - 16);
    do_stuff(&buf[n..][..16]);
}

Compiler magically will keep track of read regions instead of generating "dumb" code reading data directly from the buf pointer?

How is it vague if it's exactly how compiler has to handle extern fns coming from shared libraries? My point is that for consistency and predictability sake handling of extern fns should be the same even with static linking and LTO.

CAD97 · August 22, 2021, 3:35am

Except this is exactly the behavior that MADV_FREE leads to.

The sequence of operations:

Some memory page is deallocated with MADV_FREE
The page is not yet freed
A new allocation is done, which is mapped to the zero page
A read from that allocation is done, returning 0x00
A write to that page is done, causing a page fault and remapping that page to the page previously deallocated with MADV_FREE
A read from the allocation is done, now reading from the mapped page instead of zero

This also gets at the fact that the original question (can I read the allocated page to make sure it's zeroed) is kind of moot: the OS pager can do whatever it wants underneath you while the memory is uninitialized (so long as you let it). Often times when you map a page you don't get an actual page (overcommit) and just read zeroes until you write to the page, which forces an actual page to be mapped (which could've been previously mapped and have data in it on a system using MADV_FREE).

Well then you're requiring a environment that never uses MADV_FREE then. I use it as my example because your "simple" code that "just" reads from the newly allocated page has to face the fact that it may be getting a previously allocated page, if one was freed up with MADV_FREE previously. That's why uninitialized memory UB is so insidious: in many cases it will just "work" and "just" read some consistent value in physical memory. But in some edge case that is nearly impossible to track down and replicate you'll, I don't know, cross a page boundary and have a single non-deterministic byte.

newpavlov:

What? Do you seriously claim that for the following code:
let buf: &mut [u8; 1 << 30] = freeze(uninit_alloc_1gb());
loop {
    let n = get_random(buf.len() - 16);
    do_stuff(&buf[n..][..16]);
}
Compiler magically will keep track of read regions instead of generating "dumb" code reading data directly from the buf pointer?

No, that's not how freeze works (and that may be part of the misunderstanding).

As you've written, with Rust types, you have fn freeze(&mut [MaybeUninit<T>]) -> &mut [T]. This is not a thing, and cannot be done^[1].

What freeze is, is fn freeze(MaybeUninit<T>) -> T.

You can't freeze some big block of memory, you can just freeze the values you read out from the memory. And that's where freeze works as a barrier to prevent redundant spurious reads. If you write

let x: &mut MaybeUninit<u8> = alloc();
let xv: u8 = MaybeUninit::assume_init(*x);
take(xv);
take(xv);

It's a legal transformation to

let x: &mut MaybeUninit<u8> = alloc();
take(MaybeUninit::assume_init(*x));
take(MaybeUninit::assume_init(*x));

However, if you (theoretically) wrote

let x: &mut MaybeUninit<u8> = alloc();
let xv: u8 = MaybeUninit::freeze(*x);
take(xv);
take(xv);

then this would not be a valid transformation, precisely because two distinct reads of an uninitialized value could yield distinct results. Instead, the compiler is forced to do a single read and use that single value for both function calls.

[1] Well, if you write every byte back then it can be done. Note that also, because you're writing to the page, it's now actually initialized. The writing back is important, because of things like MADV_FREE that make it so actual mapped pages are sensitive to if they've been written to.

If you know the exact OS, paging algorithm, and allocation system you're working with, along with control the entire system's context so nothing weird happens out from under you, it might be possible to remove some of the writes down from writing all of the bytes back to maybe one per page (which is I think enough for the specific case of MADV_FREE-caused non-deterministic reads) or no writes (with a pager/allocator with no overcommit that always maps a real page), but if you want to be portable to all current and future machine configurations, you just have to tolerate that uninitialized memory is uninitialized.

There are specific cases where you can extract stronger guarantees, such as that the memory has in fact been initialized by someone (which could be the allocator through e.g. calloc), and in that case you can just read the memory. It's just not portable to when you're dealing with actually uninitialized memory.

You can argue that e.g. padding bytes aren't uninitialized, and that reading them is sound. That's a defensible position to hold, since padding bytes are uninit such that they can get spuriously read and written by the compiler. Per Rust's current guarantees, they're fully uninit and treating them as init is instant UB. I'd provide a counterargument of a padding over the size of a page plus MADV_FREE leading to problems again.

But that's not arguing that reading uninitialized memory is sound (it isn't). It's arguing that this specific case where the specifications say it's uninitialized memory, that it's actually been initialized, so it's fine to treat it as initialized. And if it has been initialized, that's potentially fine (so long as you're comfortable with it getting clobbered).

But the important point here is that uninitialized memory is UB, full stop.

A guarantee that the memory is initialized doesn't make reading uninitialized memory not UB. It makes the memory not uninitialized.

Reading uninitialized memory is UB.

riking · August 23, 2021, 7:24am

The code in #1 and #28 cannot possibly touch memory-mapped IO segments. Owned heap and stack and & referenced memory can only be main memory.

https://docs.opentitan.org/doc/ug/rust_for_c/#fnref:66 - footnote 66 - referencing the fact that spurious reads to & referenced memory are legal, which would badly mess with MMIO regions.

However, main memory can be MADV_FREE.

system · November 21, 2021, 7:24am

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.

Topic		Replies	Views
How can `numpy.empty` not result in UB, but Rust unitialized do? help	7	666	December 1, 2021
Reading uninitialized value vs undefined behaviour	15	759	January 12, 2023
Is it still a UB if an uninitialized Copy value was sent into oblivion immediately after read? help	4	390	June 12, 2023
Is transmuting `&mut [MaybeUinit<u8>]` to `&mut [u8]` an UB in my code? help	9	286	January 11, 2024
Does reading to an inactive variant of a union that has the same layout as another active variant cause UB? help	18	675	August 28, 2023

Is it possible to read uninitialized memory without invoking UB?

Related Topics