Reading uninitialized value vs undefined behaviour


#1

In Any_to_u8 O(1) time? the argument was made that reading uninitialized value can invalidate the entire program due to undefined behaviour / llvm optimizations.

After thinking on this some more, I am confused for the following reason:

  1. rustc/llvm, at compile time, because of halting problem, can not know whether a particular read is guaranteed to be undefined

  2. therefore, to preserve program correctness, when I tell rust to read from a memory address, it better read from that address

  3. given the above, how can ‘read uninitialized value’ be worse than just reading garbage?


#2

That may be true over the set of all programs, but for most programs the compiler can trace flow paths well enough to determine that code is unreachable, so similarly can determine that use can occur without initialization. The problem with UB, which is a compiler concept, is that it gives the optimization processes within the compiler freedom to make any consistent assumptions about the undefined variable(s) and then prune code based on the result.


#3

I still don’t get it. After all the analysis is done, for a particular load, the results are:

  1. this read is always initialized
  2. this read is always uninitialized
  3. we don’t know; some other unsafe rust code may cause this memory loc to be iniitalized

Now, (2) should be a compiler error, and (3) should not be optimized as (2). Therefore, where is the ‘freedom’ taht llvm/compiler has to turn uninitialized read into bad things?


#4

No, this is the crucial point about UB - the compiler can assume that it never occurs. LLVM’s optimizations will assume that UB never happens, and are only correct without the presence of UB (or rather, the program is only correct without the presence of UB).


#5

Suppose we have a struct T which has some uninitikalized mem due to padding.

Suppose we have a vec<T> with n elems.

  1. Suppose we pass this vec<T> as a char* to a C FFI function, and it reads n * sizeof(T) – this is safe right? Because the reading uninitilzied memory happens in C FFI land, which LLVM isn’t optimizing.

  2. Now, suppose we, in Rust, copy this vec<T> as a [u8] of len n * sizeo(T), and pass this as a ptr to C – then this can cause bad things becausxe the creation of the [u8] can have an unihntilzied read?


#6

In other words, in part because of undecideability, you absolve the compiler of program correctness as soon as you violate LLVM’s rules on UB.


#7

This isn’t necessarily true (the C code can be compiled with clang), and what matters here is whatever the C standard says is UB. This is complicated and depends on the standard and whether you’re using C or C++.

I don’t know what exactly you mean here. How would you copy the data out of the Vec<T>? As long as you use safe Rust, you cannot cause UB. If you want to use ptr::copy, then AFAIK the source must not contain any padding bytes, as reading those is instant UB.

You can get around this by declaring the padding yourself instead of having the compiler do it (by inserting correctly-sized padding fields between the “real” fields and making the struct #[repr(C)]).


#8

If I have an

unsafe {
}

block, is it safe to do uninitilzied read withihn the unsafe block, or is uninitilzied read so bad that we can’t even safely due it inside an unsafe block?


#9

If the compiler knows code is trying to allow an uninitialized read then you have UB even in unsafe block.

From reading a few of the messages; It also might be better of thinking of the compiler as creating machine instructions (rather than in terms of reads.) If something is creating UB the compiler is free to do what it likes. GIGO garbage in garbage out.


#10

unsafe does not increase the set of things that you are permitted to do. Instead it tells the compiler to trust that you are obeying it’s rules, even if the compiler itself is not able to prove to itself that you are complying with those rules.


#11

An unsafe block does not allow you to perform any operations that would be UB, it just removes a few checks the compiler usually performs to prevent UB in safe code.


#12

@jschievink , @TomP , @jonh

Thanks for your patience / explainations. byte level “read/write” equivalence severely limits the compiler’s ability to do optimizations

the compiler thus requires/assumes the ocde operates in a “higher abstraction”, namely “defined rust behaviour” – and does optimizations assuming everything is within this domain

my uninitilzied reads breaks this assumption, and thus, all hell could potentially break loose


#13

It’s true that the halting problem means that there are many reads where the compiler cannot know. However, there are a great many cases where it can, particularly once inlining gets involved. And just because the compiler doesn’t know today doesn’t make it sound – the terrible thing about UB is that it often seems to work initially, then breaks later when you change something unrelated.


#14

Such as a new, better-optimizing release of LLVM.


#15

Specifically, in llvm uninitialised values are initialized to a special “undef value”, and then further program analysis can see it and insert nasal demons in its place.