I'm a bit troubled here by the notion that merely reading uninitialised memory as bytes leads directly to undefined behaviour.
Well, I did some googling, and came up with this link: EXP33-C. Do not read uninitialized memory, which I hope is a good reference. On this page they say:
If an object that has automatic storage duration is not initialized explicitly, its value is indeterminate.
and following the link we find
indeterminate value [ISO/IEC 9899:2011]
Either an unspecified value or a trap representation.
unspecified value [ISO/IEC 9899:2011]
A valid value of the relevant type where the C Standard imposes no requirements on which value is chosen in any instance. An unspecified value cannot be a trap representation.
A "valid value" (as our relevant type is u8
). This isn't "undefined behaviour" yet. However: undefined behaviour seems to occur as soon as we look at the data:
The value of an object with automatic storage duration is used while it is indeterminate (6.2.4, 6.7.9, 6.8).
Damn.
This seems excessively severe to me, but yes, I can see how that arises. We've already established that an "unspecified value" is not a stable value, so can produce unpredictable results in code that uses it, so I can imagine that in principle the following code:
let s = format!("{:02x?}", get_raw_bytes(&padded_object));
could end up with unpredictable (and therefore potentially non UTF-8) characters in s
... or even worse behaviour.
I am aware that this topic has been discussed at huge length, particularly over on Internals... It does seem to me though that Rust is making a deliberate choice here (obviously driven by the existing LLVM back end) which is really making a rod for our own back. If nothing else, couldn't we have some kind of optimisation barrier available, something to say: "treat this code as already initialised"?
I'm sorry for naively rehearsing old stuff ... but I am on Users