Is it possible to read uninitialized memory without invoking UB?

That code has UB, but UB allows anything to happen, and "does what I wanted" is included in "anything". You should just be aware that changes to the behavior are not considered breaking even if they break your code with UB.

3 Likes

Why would it not allow me to read it? freeze says that the page will have an arbitrary but unknown pattern, which would mean that the compiler can't remove checks based upon knowledge of the value.

I really want to avoid solutions that rely on the compiler being too dumb, cause eventually it will figure it out. Sort of the opposite of the "sufficiently smart compiler".

Assuming touch_pointer comes from a shared library, compiler in principle is not allowed to make any assumptions about it and peeking into its implementation. But yes, it may break with static linking and LTO.

I think the point that @scottmcm is trying to make is that the compiler may optimize the code below to always print hello world without actually reading the underlying value.

let mut uninit = MaybeUninit::<u8>::uninit();

let ptr: *mut u8 = uninit.as_mut_ptr();
freeze(ptr);

if *ptr == 0 {
    println!("hello world!");
}

This is because as described in the LLVM reference, the resulting value must be fixed, but can be arbitrary. The value 0 for the memory location is valid under those constraints.

According to my reading of the reference, it should be a fixed but unknown value. The "arbitrary" part refers to fact that it could be anything, so the compiler can't optimize it by assuming that it is some particular value. The examples further support this, %z = add i32 %x, %x being even but %cmp = icmp eq i32 %x, %x2 being true or false with the compiler not being able to tell.

Of course, I am a novice and this is a really difficult subject so I would appreciate any other references supporting or refuting this stance.

The problem here is that if the compiler is ever smart enough to know that the memory is undef, then there's nothing you can do to detect that. At best you can use freeze to make your checks useless instead of UB.

All the approaches of things like the "optimization barriers" or "would mean that the compiler can't remove checks" are about forcing the compiler to be insufficiently smart to be able to actually know anything useful -- which makes those things also "work" fine without the freeze.

Make sure to go through https://llvm.org/docs/LangRef.html#undefined-values really carefully. In particular, note the difference between freeze(under) and just undef.

If you have %A = undef and %B = freeze(undef), icmp %A, %A is undef but icmp %B, %B is i1 1. It's that repeated use where the freeze is relevant. If the only use of %B is in icmp %B, 0, then it can be optimized to just freeze(undef) -- it'll have a consistent answer, but the optimizer can pick whichever answer it would prefer.

(At least, that's my understanding. I'm no Ralf, though :upside_down_face:)

3 Likes

I would generally consider the word arbitrary to refer to the guarantees that are given to us, the authors of the code, and not to be about what the compiler itself is allowed to know.

I'm no expert in the LLVM reference, but this is how we use the word "arbitrary" in pure mathematics, and I assume it is used in the same way here.

4 Likes

In general, I don't think the LLVM reference ever says anything like "the compiler is not allowed to know X", and I think it would be bad design for a language reference to say something like that. There are always much better ways of writing this in the reference, e.g. "it is guaranteed that the frozen value is the value actually stored in RAM at that location" or something like that.

1 Like

I think that this is actually an interesting case that deserves a little more examination.

As written, if we take your comment on the touch_pointer function to be correct, the code does have undefined behavior (as stated above by others).

What would happen if I replaced the C of touch_pointer with a function that did, in fact, initialize the 4096 bytes of memory pointed to? The memory would then be initialized, and the program would execute without UB.

You can't decide whether that snippet has UB or not without checking the actual definition of touch_pointer. This kind of whole program analysis is one of the major sticking points of C code that Rust set out to prevent. (Which is why you need unsafe to bring in this ambiguity.)

Another way of saying the same thing. The compiler can't decide whether that snippet has UB or not without checking the actual definition of touch_pointer. The "UB bomb" will "explode" when the veil of ignorance is stripped away.

2 Likes

In my opinion the snippet does not contain UB and it should not change in the presence of static linking and LTO. But I agree that it may.

UB is a matter of language specification and since Rust does not have a proper specification the answer can go both ways. In my opinion, it's reasonable to treat extern functions as "unoptimizable" in the spec, no matter how it gets linked and whether LTO is enabled. Same should be applied to pages which are returned by memmap and other similar syscalls. I don't think that they should be included into language specification (as memcpy). If compiler does not know anything about their semantics, then it has no choice but to treat them as any other unoptimazible extern fn.

Note that treating "raw" syscalls in such fashion does not prevent us from adding to the spec their aliases, which will have semantic meaning for compiler and thus could provide additional optimization opportunities, e.g. marking page memory returned by a hypothetical memmap alias with undef and marking it "pure", so its call may be eliminated if a returned page is not used anywhere.

What does that mean in the spec, though?

While this may be true, remember that the language model is operationable (defined by operations over state) and the as-if rule, not by some specification of what optimizations are allowed to do.

So sure, calling an extern fn that may potentially initialize the memory must behave properly in the case that the memory is initialized. But remember: MADV_FREE means that uninitialized memory can be non-deterministic in practice, not just poison in the abstract machine.

The manifestation of UB in this case would be because when you use the value, the compiler assumes that it's a normal value. But if it's uninitialized in a MADV_FREE page, two spurious reads may return different values the compiler assumes are equivalent, and you have UB manifesting because of it.

Is this manifestation rare and contrived? Kind of. Will it work fine most of the time? Sure. But importantly, it's still UB and it will still misbehave, and an optimization barrier doesn't prevent the UB from manifesting.

The only way to prevent miscompilations/misbehavior is to eliminate the UB of using an undefined value by freezeing the value to a non-deterministic but consistent non-undef memory value.

5 Likes

It's pretty unambiguous that the snippet does contain UB if the extern method doesn't initialize it. That it is "unoptimizable" doesn't really matter. UB allows the compiler to not optimize it.

In general having a spec say that something is unoptimizable is ... not great. The spec should simply specify how the program is guaranteed to behave, and there are much much better ways of specifying that than saying that it is unoptimizable.

4 Likes

It means that compiler must call this extern fn and can not assume anything about pointers which were passed to or returned from it. Compiler can not inline this call or look into how this function is implemented. In other words, extern fns act as a black box functions for compiler. AFAIK it's exactly how they are handled today, but this behavior is not set in stone in the language spec (which I think it should).

I am not sure why you use this example. There are far more common examples when the usual assumptions about memory can break. For example, memory which represents hardware state (breaks the first assumption) common in embedded or memory aliased with other threads/processes/kernel (breaks the second and third assumptions). Well, effectively all 3 assumptions are about aliasing. MADV_FREE simply represents the case when memory becomes aliased with kernel.

But in my snippet none of those assumptions is broken!

freeze is an abstract machine level function. In real hardware it is noop. freezeing memory which is aliased with kernel or other processes will not magically solve the broken assumptions. My point is that extern fns can act as freeze, albeit less effective (since compiler has to keep the call).

Treating extern fn as a black box is a perfectly reasonable behavior for a spec in my opinion. Yes, it may prevent some optimizations, but how is it different from #[inline(never)]? We use this attribute to specifically disable one of compiler optimizations, thus making such function "unoptimizable".

It's bad because it's so incredibly vague. The spec should say what behavior it guarantees in which situations instead of trying to implicitly control the resulting executable by talking about the algorithm with which it generates it.

As for #[inline(never)], there's a big difference between saying that one specific optimization shouldn't happen and saying that no optimizations may happen. The former is reasonable enough, but the latter is not really. (This is ignoring that, despite its name, #[inline(never)] is just a lint and that it doesn't actually guarantee that no inlining happens.)

1 Like

My point is that this isn't true!

The following is just a rough sketch, but the point is that using undefined values is always UB. It doesn't matter what the physical machine does, because the optimizer is working on the abstract machine semantics.

Consider the following program snippet. We'll be performing odd transformations that aren't clear optimizations to make it extra clear, but the same issues apply with actual optimization passes:

let _10: BigArray = extern_fn();
let _50: bool = pure_fn(&_10);
// lots of code ...
if _50 {
    // lots of code ...
}
// lots of code ...
if _50 {
    // lots of code ...
}

Importantly, BigArray is too big to pass in registers, so at an ABI level might look like:

let _00: *const BigArray = alloc::<BigArray>();
let () = extern_fn(_00);
let _50: bool = pure_fn(_00);
// ...

You might complain that obviously BigArray should be on the stack, not the heap. My compiler says no, it's too big, it goes on the heap. You can't observe the difference anyway.

I hope you can see the issue at this point when extern_fn doesn't write anything to its return value, but let's make it even more painfully obvious:

let _00: *const BigArray = alloc::<BigArray>();
let () = extern_fn(_00);
// lots of code ...
let _50: bool = pure_fn(_00);
if _50 {
    // lots of code ...
}
// lots of code ...
let _51: bool = pure_fn(_00);
if _51 {
    // lots of code ...
}

After all pure_fn is pure, so we can call it twice and get the same result! Except if we've never written anything to the pointer (the memory behind it is uninitialized), we can read a different value, and end up taking only one of the two branches.

And if extern_fn returns a heap pointer directly, we have to do even less work to get UB: just read it twice, no need to think up an excuse to heal allocate.

I reference MADV_FREE because it's an easy example of two reads of uninitialized memory returning different values on a physical machine. The abstract machine (the optimizer) assumes that any value you use is not uninitialized, i.e. is a consistent value. If it's not, it's trivial to cause a contradiction via redundant reads.


The way that freeze translates into an actual machine operation is that it forces there to only be a single read. Any further use of the value must be by the frozen (consistent) value already read, where without the freeze the optimizer is free to eliminate the redundant store and just read the source (undefined) value more than once.

2 Likes

@CAD97

You examples boil down to the following hypothesis: on hardware level (i.e. we are not talking about abstract machine anymore) subsequent reads of unwritten memory exclusively owned by a thread may produce different values. AFAIK it's not true, so your examples are perfectly fine and do not contain UB. Can you produce a program which would confirm your hypothesis?

Note that I consider MADV_FREE being the case of sharing page ownership with the kernel, so reading such page does not fulfill the exclusive ownership condition.

In other words, this abstract machine defines a set of assumptions guided by which optimizer can transform code. When we have a mismatch between those assumptions and reality (or code itself breaks some of those assumptions), we enter the dragons territory, since code transformations may no longer be valid. This is what we commonly call UB.

My point is that the touch_pointer function removes the undef property from an affected pointer (assuming that extern fns always work as a black box), thus compiler can no longer apply optimizations dependent on this flag.

Now the question is whether an allocated (i.e. exclusively owned), but unwritten memory, always produces the same result on reads if no values have been written into it (thus the hypothesis earlier). It's no longer a question of abstract machine, but about properties of a target system. I can imagine a contrived system which returns 0x00 on reading from physically unmapped pages and garbage stored in RAM if page has been physically mapped. In other words, writing one byte into such page can change values for all bytes in it. On such system my and your snippets would indeed contain UB, but AFAIK real systems do not behave like that.

What? Do you seriously claim that for the following code:

let buf: &mut [u8; 1 << 30] = freeze(uninit_alloc_1gb());
loop {
    let n = get_random(buf.len() - 16);
    do_stuff(&buf[n..][..16]);
}

Compiler magically will keep track of read regions instead of generating "dumb" code reading data directly from the buf pointer?

How is it vague if it's exactly how compiler has to handle extern fns coming from shared libraries? My point is that for consistency and predictability sake handling of extern fns should be the same even with static linking and LTO.

Except this is exactly the behavior that MADV_FREE leads to.

The sequence of operations:

  • Some memory page is deallocated with MADV_FREE
  • The page is not yet freed
  • A new allocation is done, which is mapped to the zero page
  • A read from that allocation is done, returning 0x00
  • A write to that page is done, causing a page fault and remapping that page to the page previously deallocated with MADV_FREE
  • A read from the allocation is done, now reading from the mapped page instead of zero

This also gets at the fact that the original question (can I read the allocated page to make sure it's zeroed) is kind of moot: the OS pager can do whatever it wants underneath you while the memory is uninitialized (so long as you let it). Often times when you map a page you don't get an actual page (overcommit) and just read zeroes until you write to the page, which forces an actual page to be mapped (which could've been previously mapped and have data in it on a system using MADV_FREE).

Well then you're requiring a environment that never uses MADV_FREE then. I use it as my example because your "simple" code that "just" reads from the newly allocated page has to face the fact that it may be getting a previously allocated page, if one was freed up with MADV_FREE previously. That's why uninitialized memory UB is so insidious: in many cases it will just "work" and "just" read some consistent value in physical memory. But in some edge case that is nearly impossible to track down and replicate you'll, I don't know, cross a page boundary and have a single non-deterministic byte.

No, that's not how freeze works (and that may be part of the misunderstanding).

As you've written, with Rust types, you have fn freeze(&mut [MaybeUninit<T>]) -> &mut [T]. This is not a thing, and cannot be done[1].

What freeze is, is fn freeze(MaybeUninit<T>) -> T.

You can't freeze some big block of memory, you can just freeze the values you read out from the memory. And that's where freeze works as a barrier to prevent redundant spurious reads. If you write

let x: &mut MaybeUninit<u8> = alloc();
let xv: u8 = MaybeUninit::assume_init(*x);
take(xv);
take(xv);

It's a legal transformation to

let x: &mut MaybeUninit<u8> = alloc();
take(MaybeUninit::assume_init(*x));
take(MaybeUninit::assume_init(*x));

However, if you (theoretically) wrote

let x: &mut MaybeUninit<u8> = alloc();
let xv: u8 = MaybeUninit::freeze(*x);
take(xv);
take(xv);

then this would not be a valid transformation, precisely because two distinct reads of an uninitialized value could yield distinct results. Instead, the compiler is forced to do a single read and use that single value for both function calls.


[1] Well, if you write every byte back then it can be done. Note that also, because you're writing to the page, it's now actually initialized. The writing back is important, because of things like MADV_FREE that make it so actual mapped pages are sensitive to if they've been written to.

If you know the exact OS, paging algorithm, and allocation system you're working with, along with control the entire system's context so nothing weird happens out from under you, it might be possible to remove some of the writes down from writing all of the bytes back to maybe one per page (which is I think enough for the specific case of MADV_FREE-caused non-deterministic reads) or no writes (with a pager/allocator with no overcommit that always maps a real page), but if you want to be portable to all current and future machine configurations, you just have to tolerate that uninitialized memory is uninitialized.

There are specific cases where you can extract stronger guarantees, such as that the memory has in fact been initialized by someone (which could be the allocator through e.g. calloc), and in that case you can just read the memory. It's just not portable to when you're dealing with actually uninitialized memory.

You can argue that e.g. padding bytes aren't uninitialized, and that reading them is sound. That's a defensible position to hold, since padding bytes are uninit such that they can get spuriously read and written by the compiler. Per Rust's current guarantees, they're fully uninit and treating them as init is instant UB. I'd provide a counterargument of a padding over the size of a page plus MADV_FREE leading to problems again.

But that's not arguing that reading uninitialized memory is sound (it isn't). It's arguing that this specific case where the specifications say it's uninitialized memory, that it's actually been initialized, so it's fine to treat it as initialized. And if it has been initialized, that's potentially fine (so long as you're comfortable with it getting clobbered).

But the important point here is that uninitialized memory is UB, full stop.

A guarantee that the memory is initialized doesn't make reading uninitialized memory not UB. It makes the memory not uninitialized.

Reading uninitialized memory is UB.

4 Likes

The code in #1 and #28 cannot possibly touch memory-mapped IO segments. Owned heap and stack and & referenced memory can only be main memory.

Rust for Embedded C Programmers | OpenTitan Documentation - footnote 66 - referencing the fact that spurious reads to & referenced memory are legal, which would badly mess with MMIO regions.

However, main memory can be MADV_FREE.

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.