Is it possible to read uninitialized memory without invoking UB?

Fascinating question! To answer it directly: no, if the memory is uninitialized, reading it is UB.

But I think your underlying question cuts much deeper. "uninitialized" is a concept of the abstract machine, but syscalls exist outside of that. So the question can be reformulated: how syscalls (and mmap) in particular, interact with the abstract machine.

The following is my understanding, it's not necessary correct.

Here's an non-exhaustive list of things an evil/buggy OS can do when ask to map and zero-init a page of memory:

  • give you a zeroed page of memory
  • give you a page of memory with some existing data
  • give you a zerod page of memory, but, after 10 milliseconds, change the mind and remap this virtual page to a different page. Alternatively, OS could overwrite the contents of the page in place
  • give you a page of memory, which aliases some existing page (eg the one containing call stack)

To plug that into abstract machine, we need to make sure that semantics of the syscall corresponds to AM's understanding of what memory is. And that I think roughly is:

  • if you read a byte from address, you'll read the same byte later
  • if you write a byte to address, you'll read the same byte later
  • memory is not aliased -- writes/reads from any address affect only the contests of this address, and, conversely, they are the only operations that affect the contents of the address.

If what you get out of mmap fulfills those properties, then you can treat is as memory. In particular (provided a correct implementation of mmap), the following program is correct:

  1. Select "zero out" flag at runtime at random
  2. mmap a new page which might, or might be not, zero
  3. assert that the page is zero (this will panic, but will not be UB)

That is, even non-zeroing unmap is guaranteed to return a distinct memory which holds some value and won't change from under your feet. This is in contrast to, eg, uninitialized local variables, whose storage (while they are uninitialized) might be used by the compiler as a scratch space.

That's the core difference -- if the compiler knows that the stuff is uninitialized, it can play tricks. When you call random extenal C function/syscacll that "returns memory", compiler can't assume that it is uninitialized. If you call raw mmap, that's what happens: there's nothing telling the compiler that the newly returned memory can be used as a scratch space.

In contrast, if you call language own memory allocation function (alloc::alloc), that can be an explicit part of the abstract machine. Compiler is allowed to know that the returned value is allocated, but uninitialized memory, and can optimize based on that.

Finally, you don't have to treat the result of mmap as memory. Consider a program which probabilistically maps a page to a physical memory, or to a random device which returns random values for loads. For such program,

let mut p: *const u8 = lo;
while p < hi {
    let byte = *p;
    assert!(p == 0);
    p += 1;
}

would be UB. Random device is not memory, and it breaks compiler's assumption that reads always return the same value.

Instead, you can treat the addresses as non-memory, using volatile reads:

    let byte = core::ptr::read_volatile(p);

With volatile, this won't be UB.

Practically, I'd just memzero the memory if I don't trust the OS. debug-assert loop would be fine as well.

5 Likes