Indeed. References already have to point to valid objects on pain of UB, so a valid reference is only ever going to point to something within whatever address space the arch gives you.
Also I don’t think that bit pattern alone can tell you whether a type is a POD or not. i.e. add a Drop impl to a newtype wrapper around an integer, and now there’s more meaning behind the type than just its bits.
Don't references also have to be aligned? At least, this would be implied by pointing to a valid instance. That means some of the LSBs must be 0 if the alignment is greater than 1.
f32/f64: yes, everything is valid, just a great deal of them are NAN
It’s safe and reversible to cast a usize to any (thin) raw pointer type, so yes, raw pointers can have any bit pattern
As others have said, references need to be non-null, aligned, and point to a valid object of the type. (Aligned needs to be mentioned separately because of ZSTs, where anywhere is a valid object in the “it can be read by ptr::read_unaligned” sense.)
For structs what you say is technically only true for repr(C) – repr(rust) (the default) is technically allowed to include arbitrary, important extra information should the compiler deem it necessary. (Not that that actually happens today in any situation of which I’m aware.)
unions I suspect the answer isn’t actually finalized yet, since it depends what the rules end up being around whether the semantics are defined in terms of which variant was assigned, as just splatting bits in wouldn’t set any of the variants as active (in a official semantics sense, obviously not in a “something tracked in release code in memory” sense).
Partly, but also partly because I thought it was theoretically possible for optimization passes to make wierd UB happen when you violate things like this. So what happens if I do:
let val: u8 = mem::uninitialized();
println!("{}", val);
Because every possible bit of data is valid, is this not UB? Or is it still UB because the compiler assumes val is never assigned to and optimizes it away?
I see, sorry I misunderstood your point about alignment. I was showing that a reference to a struct member is unaligned WRT the struct itself, but that’s kind of pointless. We’re aligned now (pun intended).
FWIW, the struct itself also only has 1-byte alignment. Aggregates are aligned to the maximum alignment of their members, unless you force it larger with #[repr(align(N))].
If the only UB is that the example prints any number, that’s different to it allowing remote code execution, for instance. But I guess that it’s best just to think of UB as a black box where anything (including the worst possible thing) can happen.
Right, it should be considered a black box. It’s certainly interesting to see what happens in practice on a particular compiler version, but that’s mostly just intellectual curiosity.
Just to make it clear, both C and Rust consider undefined behavior to be anything goes. The compiler is free to make literally anything happen including, but not at all limited to, segfaults.
Although it’s been mentioned in many other threads, I haven’t seen it in this one. Some architectures, most notably Itanium, detect and fault on [edit: read] references to uninitialized memory. The fact that all bit patterns in an initialized memory cell are valid does not imply that the same memory cell can be read when uninitialized.
It is best to think that. Sure, that specific toy program in the Playground might behave a particular way on a particular architecture, but what if the UB were part of a larger actively developed program? Who knows what might happen as code is added or removed or rearranged and the optimizer consequently gets a different view of the program.
And tbh even a black box approach isn’t sufficient. UB’s effects can potentially manifest anywhere in a program, not just at the site where the bad behavior is introduced.