Uninitialized memory varying from read to read

In the MaybeUninit docs:

Moreover, uninitialized memory is special in that it does not have a fixed value (“fixed” meaning “it won’t change without being written to”). Reading the same uninitialized byte multiple times can give different results. This makes it undefined behavior to have uninitialized data in a variable even if that variable has an integer type, which otherwise can hold any fixed bit pattern.

I can see how if you trigger UB the compiler could compile your code in a way that it's as if reading a memory location twice gave different results, because the optimizer could decide returning anything is valid and make different choices in different contexts, but this seems to be saying the causality can go the other way -- that there could be an odd CPU/arch/OS where uninit memory would give different results when repeatedly read, and that this is the reason uninitialized memory must be UB? Can that somehow happen?

I'd expect at the asm level if I read a memory location without writing to it first I might get any value, but that no matter how many times I read from it without writing to it inbetween I won't get a different result (excluding remapping the address with mmap). Is that not true?

1 Like

Yes. Memory tagged with MADV_FREE can asynchronously change from a nonzero value to a zero value before it is initialized: madvise(2) - Linux manual page.

6 Likes

I'm guessing memory-mapped IO doesn't count here, does it?

Because when you map hardware registers to addresses it's really common for the memory to change in the background (e.g. because that address is attached to an input) or as a side-effect of reading (e.g. flags will often clear themselves after they've been read).

To be clear, the statement in the documentation is correct regardless of whether the situation is possible from the hardware/OS point-of-view.

8 Likes

Nah I wouldn't consider MMIO to count, I'd consider that to be another device having init'd the memory.

1 Like

That' the opposite of the poster. That would be initialized memory changing wouldn't it? That looks lie read 5, 5, 5, then 0s.

Is the results of an madvice free tagged as uninit? Sounds legit for that. Seems like a rather strong blanket asserion for such a corner case.

I can see other cases of memory changing underneath you, but those are more the case of you writing to a region controllled by someone else, and that seems different.

The memory is semantically not initialized - that's the only reason it's valid to madvise it with MADV_FREE.

That;s a little odd. It asserts some supremacy of the compiler over the hardware that it actually runs on.

Not all UB is the same, and it isn't free license for the compiler to do anything it wants - or claim it can do anything it wants in the future even if it doesn't do it right now. If the compiler noticed UB and decided to call exit or reboot, that would be a bug. It deciding to scribble over memory for little reason is also a bug.

This is seems like those fairy tales told to kid about monsters will each them if they go into the woods at night, so they dont go out of fear. Sooner or later, they figure it out and then stop believing you in other things too.

Not following. you advise free after you use the memory to allow the vm to reclaim if needed. So it has been initialized and used already.. the advise free put it in a weird possible initlaized state semantically, but rust doesn't know anyting about that afaik.

a similar thing happens with object pools. It goes from used, you put it back in the poll that might zero it. same effect, but that wouldn't be called uninitialized would it?

But after that it is always zeros which seems counter to the description of it changing each call.

UB is a license for the compiler to behave as if the same piece of uninitialized memory has different values from two different reads even if no write could have changed it. LLVM already does so today under some circumstances — I have made some simple examples of it in the past in a different conversation, and I can try to reconstruct one of them if you don't believe me. I also suspect that you would find it difficult to convince the LLVM maintainers that it is an actual bug.

4 Likes

Its been tried by better people than me. I am by far from some corner minority. It is similar to any organizational issue. UB is seen as open territory - free space - so everybody runs to claim it as their own with all the resulting fighting in the process. Compiler writers proclaim it for nebulous future optimizations - explicitly disallowing it in the process. Devs claim it as as something needed to eek the most of the how the hardware actually works.

I know why UB strangeness happens sometimes, other times it is laziness since some compiler and language people believe that they write to a spec so once the spec is satisfied they shoudln't have to do more work, and then think that programmers should write to them.

It is a very poor way of viewing the process - all the tools along the way are just means to an end. The spec isn't on a stone tablet that came from Sinai. These are just tools to help the programmer write something for the hardware it will run on. Not run interference between the two.

Part oft this specific UB problem is self inflicted. Ir could use the memory region thet as any other memory read. Or it could always make it constant value. Instead rust follows the idea of MOST surprise just starts folding stuff away.

The idea here is that reading uninitialized memory already indicates that a logic error is occurring, since it has no relationship to whatever one's program is trying to compute. The only way to get consistent data from uninitialized memory would be to subtract it or XOR it from itself (or equivalent), which could just as easily be done by memsetting it to 0 directly. That's why Rust introduces the MaybeUninit abstraction, so that the programmer can be more explicit as to when the data is fully initialized.

2 Likes

Note that even that isn't enough. It's in fact one of the the common misconceptions called out in the LLVM-IR reference: https://releases.llvm.org/14.0.0/docs/LangRef.html#undefined-values

3 Likes

Ah, I meant in the hypothetical case where uninitialized memory does have a definite value. Since having a definite value doesn't help programs in any meaningful way (as I described), compilers can take the opportunity to make the value indefinite and UB to read.

That's literally what undefined behavior is. If the compiler is limited as to what a certain language construct may do, that specified limitation is something that defines the behavior of that construct in some way (not necessarily a unique possible behavior).

5 Likes

It literally is. This is incorrect.

6 Likes

While both makes sense on their own without context, it's too late to argue the latter without your own toolchain. In 1980s there was some heated debate between them and the compiler devs won, made their claims common sense.

3 Likes

To be fair how I got to digging into these semantics is that in unsafe code Rust's semantics cause it to occur in situations with no logic error. Because struct padding is considered uninitialized, if you copy a struct into an already zero initialized &mut [u8] you have violated the guarantee references only point to fully initialized data since the uninitialized-ness of the padding bytes is viral. So copying a struct into such a slice and back out with no other operations is considered UB, which is very surprising, especially since in principle "moves are just bitwise copies" and this is doing the same thing manually.

1 Like

Copying struct with padding into &mut [u8] is logic error. You need to use &mut [MaybeUninit<u8>] instead.

Even in C after copy out some struct into byte array you shouldn't consider the whole array is initialized. The only exception the spec mentions is the moment right after memset 0(only specifically memset 0) the struct and copy it out as a byte array.

I wonder it can be applied to #[repr(C)] rust structs with std::mem::zeroed() initialization as it's what C does.

Optimizations has always been why UB exists.

For example, this is UB:

let mut x = 3;
let mut y = 4;
ptr::addr_of_mut!(x).add(1).write(5);
dbg!(y);

Why? Because programmers asked the compiler to do register allocation for them. If the compiler was forced to deal with arbitrary pointer BS like this, then y has to go into memory always, it has to always use exactly the same layout of local variables, and a whole bunch more of horrible pessimizations that even original C developers wouldn't accept.

That's the deal. If you use a compiler, you have to follow its rules. If you don't want to, then you can go back to doing register allocation yourself (among 1000 other such things).

Down in hardware you'll see different kinds of "not zero or one" bits more directly, with things like Z and X bits in IEEE 1364. Even at that level the "don't care" is useful for asking the "optimizer" (which might be yourself with a Karnaugh map) generate better "code".

13 Likes