Is it possible to read uninitialized memory without invoking UB?

matklad · August 3, 2021, 10:32am

Fascinating question! To answer it directly: no, if the memory is uninitialized, reading it is UB.

But I think your underlying question cuts much deeper. "uninitialized" is a concept of the abstract machine, but syscalls exist outside of that. So the question can be reformulated: how syscalls (and mmap) in particular, interact with the abstract machine.

The following is my understanding, it's not necessary correct.

Here's an non-exhaustive list of things an evil/buggy OS can do when ask to map and zero-init a page of memory:

give you a zeroed page of memory
give you a page of memory with some existing data
give you a zerod page of memory, but, after 10 milliseconds, change the mind and remap this virtual page to a different page. Alternatively, OS could overwrite the contents of the page in place
give you a page of memory, which aliases some existing page (eg the one containing call stack)

To plug that into abstract machine, we need to make sure that semantics of the syscall corresponds to AM's understanding of what memory is. And that I think roughly is:

if you read a byte from address, you'll read the same byte later
if you write a byte to address, you'll read the same byte later
memory is not aliased -- writes/reads from any address affect only the contests of this address, and, conversely, they are the only operations that affect the contents of the address.

If what you get out of mmap fulfills those properties, then you can treat is as memory. In particular (provided a correct implementation of mmap), the following program is correct:

Select "zero out" flag at runtime at random
mmap a new page which might, or might be not, zero
assert that the page is zero (this will panic, but will not be UB)

That is, even non-zeroing unmap is guaranteed to return a distinct memory which holds some value and won't change from under your feet. This is in contrast to, eg, uninitialized local variables, whose storage (while they are uninitialized) might be used by the compiler as a scratch space.

That's the core difference -- if the compiler knows that the stuff is uninitialized, it can play tricks. When you call random extenal C function/syscacll that "returns memory", compiler can't assume that it is uninitialized. If you call raw mmap, that's what happens: there's nothing telling the compiler that the newly returned memory can be used as a scratch space.

In contrast, if you call language own memory allocation function (alloc::alloc), that can be an explicit part of the abstract machine. Compiler is allowed to know that the returned value is allocated, but uninitialized memory, and can optimize based on that.

Finally, you don't have to treat the result of mmap as memory. Consider a program which probabilistically maps a page to a physical memory, or to a random device which returns random values for loads. For such program,

let mut p: *const u8 = lo;
while p < hi {
    let byte = *p;
    assert!(p == 0);
    p += 1;
}

would be UB. Random device is not memory, and it breaks compiler's assumption that reads always return the same value.

Instead, you can treat the addresses as non-memory, using volatile reads:

    let byte = core::ptr::read_volatile(p);

With volatile, this won't be UB.

Practically, I'd just memzero the memory if I don't trust the OS. debug-assert loop would be fine as well.

Topic		Replies	Views
How can `numpy.empty` not result in UB, but Rust unitialized do? help	7	665	December 1, 2021
Reading uninitialized value vs undefined behaviour	15	758	January 12, 2023
Is it still a UB if an uninitialized Copy value was sent into oblivion immediately after read? help	4	386	June 12, 2023
Is transmuting `&mut [MaybeUinit<u8>]` to `&mut [u8]` an UB in my code? help	9	286	January 11, 2024
Does reading to an inactive variant of a union that has the same layout as another active variant cause UB? help	18	670	August 28, 2023

Is it possible to read uninitialized memory without invoking UB?

Related Topics