In Any_to_u8 O(1) time? the argument was made that reading uninitialized value can invalidate the entire program due to undefined behaviour / llvm optimizations.
After thinking on this some more, I am confused for the following reason:
rustc/llvm, at compile time, because of halting problem, can not know whether a particular read is guaranteed to be undefined
therefore, to preserve program correctness, when I tell rust to read from a memory address, it better read from that address
given the above, how can âread uninitialized valueâ be worse than just reading garbage?
That may be true over the set of all programs, but for most programs the compiler can trace flow paths well enough to determine that code is unreachable, so similarly can determine that use can occur without initialization. The problem with UB, which is a compiler concept, is that it gives the optimization processes within the compiler freedom to make any consistent assumptions about the undefined variable(s) and then prune code based on the result.
I still donât get it. After all the analysis is done, for a particular load, the results are:
this read is always initialized
this read is always uninitialized
we donât know; some other unsafe rust code may cause this memory loc to be iniitalized
Now, (2) should be a compiler error, and (3) should not be optimized as (2). Therefore, where is the âfreedomâ taht llvm/compiler has to turn uninitialized read into bad things?
No, this is the crucial point about UB - the compiler can assume that it never occurs. LLVMâs optimizations will assume that UB never happens, and are only correct without the presence of UB (or rather, the program is only correct without the presence of UB).
Suppose we have a struct T which has some uninitikalized mem due to padding.
Suppose we have a vec<T> with n elems.
Suppose we pass this vec<T> as a char* to a C FFI function, and it reads n * sizeof(T) â this is safe right? Because the reading uninitilzied memory happens in C FFI land, which LLVM isnât optimizing.
Now, suppose we, in Rust, copy this vec<T> as a [u8] of len n * sizeo(T), and pass this as a ptr to C â then this can cause bad things becausxe the creation of the [u8] can have an unihntilzied read?
This isnât necessarily true (the C code can be compiled with clang), and what matters here is whatever the C standard says is UB. This is complicated and depends on the standard and whether youâre using C or C++.
I donât know what exactly you mean here. How would you copy the data out of the Vec<T>? As long as you use safe Rust, you cannot cause UB. If you want to use ptr::copy, then AFAIK the source must not contain any padding bytes, as reading those is instant UB.
You can get around this by declaring the padding yourself instead of having the compiler do it (by inserting correctly-sized padding fields between the ârealâ fields and making the struct #[repr(C)]).
block, is it safe to do uninitilzied read withihn the unsafe block, or is uninitilzied read so bad that we canât even safely due it inside an unsafe block?
If the compiler knows code is trying to allow an uninitialized read then you have UB even in unsafe block.
From reading a few of the messages; It also might be better of thinking of the compiler as creating machine instructions (rather than in terms of reads.) If something is creating UB the compiler is free to do what it likes. GIGO garbage in garbage out.
unsafe does not increase the set of things that you are permitted to do. Instead it tells the compiler to trust that you are obeying itâs rules, even if the compiler itself is not able to prove to itself that you are complying with those rules.
An unsafe block does not allow you to perform any operations that would be UB, it just removes a few checks the compiler usually performs to prevent UB in safe code.
Thanks for your patience / explainations. byte level âread/writeâ equivalence severely limits the compilerâs ability to do optimizations
the compiler thus requires/assumes the ocde operates in a âhigher abstractionâ, namely âdefined rust behaviourâ â and does optimizations assuming everything is within this domain
my uninitilzied reads breaks this assumption, and thus, all hell could potentially break loose
Itâs true that the halting problem means that there are many reads where the compiler cannot know. However, there are a great many cases where it can, particularly once inlining gets involved. And just because the compiler doesnât know today doesnât make it sound â the terrible thing about UB is that it often seems to work initially, then breaks later when you change something unrelated.
Specifically, in llvm uninitialised values are initialized to a special âundef valueâ, and then further program analysis can see it and insert nasal demons in its place.