TLDR: But as I finished writing up my answers, I think I finally locked in on what is the root of the debate here is:
Should a value be considered produced if it's ephemeral and it's replaced before anyone else has a chance to read it? Can a value unknown to all influence the behaviour of the application?
In a more real-world - albeit contrived - example, can me writing down the base64 encoded private key for Satoshi Nokomoto's wallet and hiding it in a backpack in the woods impact your life, IF you don't have knowledge about it? Even if you do have knowledge about it, it doesn't necessarily impact your life, because you may or may not go searching for it, depending on your concience, but if you don't know about it, it CANNOT impact your life.
@mdHMUpeyf8yluPfXI
No I have not, and that was an interesting read, thank you. But note that even in the article the thing that makes the code UB is that the code may be optimized and rearranged in a way that the boolean is read and interpreted. In this case the whole argument is that is is overwritten before that can happen. No code accessing the value written after the assignment can be moved before that (data barrier).
I interpret the article as basically saying that although UB is triggered when an invalid value is actually read and interpreted, you might be wrong thinking that some value is not used, because the compiler can rearrange the code in a way that you don't expect, but that doesn't seem to apply to this particular case.
@steffahn
Sure using maybeunint can be used in this case, but that does present an (arguably small) performance impact, since the maybeunint needs to be assume_inited on each call, which doesn't appear to be a const fn. The impact is even worse when writing an accessor to hide this API. Not sure if .assume_init() can in reality be optimized away so that it would be zero-cost in practice or not, but currently casts cannot be const-fns I think.
There’s literally no downside to using a MaybeUninit
, except perhaps that you have to explicitly use the MaybeUninit
API. But if there’s so many places where the static mut is unsafely accessed the convenience of being more concise than explicit MaybeUninit
API usage allows is a concern, feel free to write a wrapper type implementing Deref
and maybe even DerefMut
.
While the performance impact is arguable small, and it might be a case of premature optimization, which is the source of all evil, I'm also interested in the theoretic side of this. The original snipped was written as part of an embedded, real-time project where CPU is a scarce resource and every CPU cycle counted. Single CPU, single-threaded operation.
If all accesses after initialization are read-only and safe anyways
Yes, all accesses need to be read-only, that's for sure.
@afetisov
Because constructing an invalid bool
is UB, regardless whether you read it or not.
That's a statement, not an argument. And actually the question I'm looking for an answer for. Why would constructing an invalid bool be UB, IF and only IF I can guarantee it won't be read before it's overwritten with a valid value?
Nobody promised you that, that's just your assumptions. The compiler is free to implement mem::swap
in whatever way it sees fit.
You are absolutely right. Let's swap out mem::swap
with libc::memcpy
-s to be on the safe side.
This means that even if mem::swap
specifically works as you expect, you still get UB and can get miscompilations in a different part of program, because the compiler has derived incorrect conclusions from your invalid operations.
For example?
But more importantly, you basic mental model of the compiler is incorrect. "split-nanosecond", "overwritten before anyone has a chance to do so" --- do you think you're playing catch with the compiler? Or that you should and could trick the hardware into overlooking your shennanigans? That's not how it works. If your code is incorrect "for a split-nanosecond", it's just incorrect period, and the behaviour of your entire program is Undefined.
That was unnecesarily emotional. The hardware executing the resulting code does not have the notion of types or casts. If no code is generated with the wrong assumptions (do this if there is a 0 in this 8-bit space, and do this if there is a 1 in that 8-bit space, with no other cases is a good example), it's not going to do anything undefined. I'm not sure what you mean by playing catch with the compiler, but i'm pretty sure that the hardware executes instruction in sequence (assuming single-CPU execution). When calling calloc
for example, you will have random data in the memory allocated by the allocator, but that is zeroed out before anyone has a chance to access it (calloc zeroes it out before returning). So it's a similar, but somewhat different case. It has non-zero data for a split nanosecond, but nobody cares, because it's zeroed out before any other part of the codebase "knows" about its existence. Same case here, the invalid value is overwritten with a valid one before any other part of the codebase "knows" about it's existence. Any code looking at the variable sees a valid value there.
and rewrite it into an entirely different program with the only restriction being that it must have exactly the same observable behaviour.
That including a restriction that reads to a value cannot be moved before writes. So all reads need to happen after it is initialized. Note: We are singlethreaded, but that doesn't even matter, because even in a multithreaded environment, when main()
is invoked it is running a single thread only, and it will fork off later on into multiple threads.
Unsafe code must obey exactly the same rules as safe code. You don't get to "turn off the borrow-checker" or "turn off the type system", you just get the capability to perform some new dangerous operations, and it's your responsibility to ensure they uphold the same rules as safe code.
Exactly. That's why the snippet comes with a big fat warning and strict rules that the program needs to adhere to to make the guarantees that any reads to the memory region already see initialized code.
@Coding-Badly
Yep, I was just trying to jog my memory about AVR processors having something like that.
@alice
The fact that the invalid memory is behind a raw pointer is very important for the UB rules, because raw pointers are not required to point at a valid value.
I think that is considered UB because safe code may be optimized in a way that it accesses that invalid region, while *mut can only be accessed in unsafe code, where the compiler can rely on the programmer to guarantee that access only happens if the memory is verified to be valid. (Also I think unsafe code is not rearranged by the compiler?). Or simply that by doing that you will most likely shoot yourself in the leg, because in most cases it cannot be guaranteed that no accesses happen to that variable, even in safe code.
However, that cannot be used to argue that it isn't UB because the relationship between UB and miscompiled is only a one-way relation:
I agree, however I'm starting to have the feeling we are conflating two things. UB-suspicious code as in badly written code itself, and the actual UB, that happens when the hardware is running the UB-suspicious binary, and triggering a code path where it starts exhibiting undefined behaviour ie. doing random stuff, that is not what we would expect by looking at the code.
you must refer to the rules for UB instead.
The rule you must be referring to is this: Producing an invalid value, even in private fields and locals.
. Which is clearly documented, and I guess we all agree on why it's bad.
TLDR: But as I finished writing up my answers, I think I finally locked in on what is the root of the debate here is:
Should a value be considered produced if it's ephemeral and it's replaced before anyone else has a chance to read it? Can a value unknown to all influence the behaviour of the application?
In a more real-world - albeit contrived - example, can me writing down the base64 encoded private key for Satoshi Nokomoto's wallet and hiding it in a backpack in the woods impact your life, IF you don't have knowledge about it? Even if you do have knowledge about it, it doesn't necessarily impact your life, because you may or may not go searching for it, depending on your concience, but if you don't know about it, it CANNOT impact your life.