There is an unsafe block so std is totally free to define this as UB as if std::hint::unreachable_unchecked() was called... though... how is it done, exactly?
Discriminant reading can be moved out of the block, the only unsafe operation is std::mem::transmute call so it must be the one invoking UB. There seem to be two possible options which it is:
[undefined.intrinsic] Invoking undefined behavior via compiler intrinsics.
I would imagine this like "transmute may check whether the source type is Discriminant<T>, and if it is, insert a spurious unreachable_unchecked() call". Doesn't seem applicable to other ways of reinterpreting a value (like ptr::read_unaligned) though.
[undefined.invalid] Producing an invalid value.
However, the only invalid value I know for the primitive number types is uninitialized memory.
It is certainly possible, and no trait on Discriminant<T> rules that out.
Question
Now, my question is: if one day I decided to make a structure S which would invoke immediate UB upon attempt to transmute it to primitive, what could I do? One option I see is carrying around a MaybeUninit<...> to have some uninit bytes in the struct.
It's done by not providing any documentation regarding the contents of Discriminant. Hence transmuting it to anything doesn't provide any guarantees, hence: undefined behavior.
I guess technically it's the "invalid value" possibility.
You could make the fields private and write in the documentation "I give no guarantees regarding the layout of the struct", thus making transmute UB.
I don't think that's valid. For example, layout of #[repr(rust)] structs also doesn't have any documentation, and you can't really rely on it for anything, but simply printing offset_of!(Struct, field) isn't considered UB. It's undertermined, and could change on any compilation, but not Undefined in the sense of "blows up the Abstract Machine".
I think the docs are just overly defensive here. Even if there is no good reason for transmute::<Discriminant, u8>() to be undefined, there is no good reason to define it either, and so one must exercise caution. That said, it's technically possible for the discriminant to have arbitrary internal padding, which would make it impossible to interpret as an integer without UB, even the sizes match. I wouldn't be surprised if this possibility happens in practice.
But I was talking about transmute, not about offset_of. transmute can potentially map the unknown internal representation of Discriminant to an invalid value, hence it's UB.
I guess maybe transmuting a 1-byte Discriminant to u8 isn't UB when you know there are at least 2 possible discriminants? I'm not sure about this. But even if that case is fine, it would be a special case, in general it's UB.
I think this comment sheds a lot of light on your needs, and I thank you for it. My initial reaction is that by defining the behaviour of something, it ceases to be "undefined behaviour" (and I stand by that), but now I think I have a better handle on what it is you wanted to nail down.
I think you have reached the expectation that undefined behaviour is like errors and exceptions: when your program does thing X, the computer does thing Y in response; in this case, "thing X" is "transmute a Discriminant" and "thing Y" is "exhibit undefined behaviour." That's a fairly natural reading, and one I think Rust encourages by making so many other categories of undefined behaviour hard to express, but it's not a useful model for describing why undefined behaviour might exist or what evaluations could have undefined results.
Undefined behaviour, instead, is a set of characteristics that any analysis of your program is free to assume never happens. "Analysis" here includes formal correctness analysis, type checking, compilation and translation, and execution - they are all ways of exercising the properties of your program. In that lens, something that is documented as being undefined is undefined - even if you can see how to build it out of well-defined parts (and even if it is actually built out of those parts under the hood).
In other words, @tczajka is right: in order to declare that transmuteing your struct is undefined, at least to consumers of your code, all you need to do is say so. You do not need to put any technical measures in place to ensure that the compiler emits specific code, or to ensure that evaluation of that undefined operation produces any specific result. There are things you can do to make it harder for your downstream callers to inadvertently rely on undefined characteristics, since the actual implementation will necessarily exhibit some observable behaviours anyways.
At the end of the day "definedness" comes from specification, not from implementation, so if you say that an operation is undefined, callers must avoid performing that operation, or risk their programs leaving the realm of well-specified Rust behaviours, even if their actual program is completely composed of otherwise well-defined behaviours.
You are attempting to assert that the undefined behaviour caused by evaluating transmute on a Discriminant must have a definition. It does not; if it did, it would cease to be undefined behaviour. Those two possibilities are both very likely, but neither one is required by Rust. In fact, the document you cite is pretty clear about that: the list begins with a warning reading "The following list is not exhaustive," so your conclusion that it must be one of the items on the list does not follow from your otherwise reasonable analysis.
As of the version of Rust available on https://play.rust-lang.org/ today, the following program has an intuitive, reasonable behaviour in practice:
use std::mem;
enum Foo {
A, B,
}
fn main() {
let discrim = mem::discriminant(&Foo::B);
let as_prim: u64 = unsafe {
// Safety: none; this is deliberately exercising undefined behaviour.
mem::transmute(discrim)
};
println!("{as_prim:?}");
}
Specifically, it prints 1. It will likely always print that, indefinitely into the future. The program does not exhibit any surprising behaviour, at least for an intuitionistic meaning of the word "surprising," and probably never will.
It is nonetheless an undefined program. In point of fact there's no guarantee that this program produces output at all, or that it does so in finite time, or that it avoids side effects I might find harmful. And if the Rust implementors make a change to the implementation that changes the behaviour of my broken example program, then I get to keep both pieces, ultimately.
That was very clear, thank you! My understanding is that your description applies equally well to language UB and library UB, and the main difference is where the guarantees or lack of guarantees are documented. Is that correct?
UB is all about API specifications. The behavior for certain inputs is defined by the API spec, for others it is undefined.
This can apply to the language specification, std crate specification, or any other crate specification.
So UB is not about what the implementation of an API does, it's about what the specification guarantees. You can have UB in an API exported by a function without the function implementation encountering any UB in the lower level APIs that it in turn uses.
Rust has a special rule that only unsafe APIs should have undefined behavior. There is no mechanism enforcing this, so this is more of a "social" rule. We call APIs that break this rule "unsound". The language and std guarantee not to have such unsound APIs.
In fact, the document you cite is pretty clear about that: the list begins with a warning reading "The following list is not exhaustive," so your conclusion that it must be one of the items on the list does not follow from your otherwise reasonable analysis.
A good reminder, yes!
Though, the situation still remains interesting. transmute has safety precondition expressed as
Both the argument and the result must be valid at their given type.
If the precondition of transmute was fully satisfied, behavior would not be undefined; therefore, since UB is declared here, one of the two precondition parts fails.
It is clear that Discriminant<T> is valid since it is obtained through safe, presumably sound API. That generally means that bitwise-moved discriminant is not valid for primitive types! (Or that transmute does not list all assumptions, which is somewhat possible given it doesn't have "Safety" section at all.)
in order to declare that transmuteing your struct is undefined, at least to consumers of your code, all you need to do is say so
I don't think that's right? If I wrote my function unsafe fn handle_my_struct(s: SecretBox) then I could indeed write whatever requirements to call it, and it would not be a breaking change to assume any of them. However, to make transmute UB I must fail its preconditions.
No, all it means is that it's not guaranteed to be. For example, it could contain padding or otherwise uninit bytes.
The difference here is between "language UB" and "library UB". Language UB is when you directly break a precondition of the language (like dereffing a null pointer). Library UB is when you break an invariant of a public API, or make your unsafe code rely on an undocumented and unpromised detail of it.
Library UB doesn't necessarily lead to language UB (just like transmute::<Discriminant, u64> might not be directly illegal), but it's still unsound to rely on it. That's because any change could break it, without warning or major version bump. So while the two are technically distinct, it's always best to treat them the same.