I hear that reading bytes of an object of another type by using char* in C++ is a kind of UB. I wonder whether we can do the same things in Rust. For example:
fn main() {
let c = f32::NAN;
unsafe{
let ptr = &c as * const f32 as * const u8;
println!("{:08b}",*ptr);
println!("{:08b}",*ptr.add(1));
println!("{:08b}",*ptr.add(2));
println!("{:08b}",*ptr.add(3));
}
}
By using Miri to check this piece of the code, it does not report that there is a UB in this program. I'm not sure whether Miri can detect any kind of UB, there is no specification about UB operations in Rust, is this code well-defined?
It can't; Miri has false negatives but no false positives. So not being flagged by Miri doesn't guarantee correctness, but being flagged by Miri does guarantee UB.
As for your question: reading uninitialized and padding bytes in UB. You particular code for the initialized value of the specific type of f32 is fine, because f32 doesn't have any padding bytes, and the place is fully initialized.
Note that there are quite a few other things that come into play when re-interpreting bits like this, and it's not easy to get it right:
As mentioned, the place must be fully initialized, you are not allowed to read padding bytes or bytes of any uninitialized value.
The alignment of the pointer through which you are reading must be sufficient. For bytes, the alignment is always sufficient as byte pointers only need to be aligned to 1 byte boundaries, whereas f32 has alignment 4, but you couldn't e.g. read an array of 4 bytes and reinterpret it as a *const f32, because alignment would not be sufficient.
Pointer arithmetic is tricky; the naïve use of .add(n) is almost certainly not correct; you should uphold a great big number of safety invariants before being allowed to use ptr.add(...).
It depends. It's unsound in the general case because of things like padding bytes, which may be uninitialized. Or just, you know, uninitialized data generally.
Rust doesn't have a complete specification, but some things are explicitly defined.
And some things are called out as definitely UB. Those things you can soundly rely on not having happened. Example: it's UB to create a NULL &T; you can rely on &T not being NULL for soundness, because if you received a NULL &T, UB has already occured.
The rest is, well, also undefined... but not in a way you can count on never happening. There will always be some wiggle room or the language would stagnate or be forced to break backwards compatibility.
The UB case in C++ is called strict aliasing rule. Rust doesn't care about it since it controls aliasing explicitly with shared(&)/exclusive(&mut) references.
Strict aliasing is not about exclusivity, it's about accessing a place through a pointer to a different type. C++ does not care about exclusivity, at all.
Also, as a special case, strict aliasing explicitly does not apply to reading the bytes of any object through char *.
That is UB because it violates the principle of the pointer arithmetical operations
When an expression J that has integral type is added to or subtracted from an expression P of pointer type, the result has the type of P.
If P evaluates to a null pointer value and J evaluates to 0, the result is a null pointer value.
Otherwise, if P points to an array element i of an array object x with n elements ([dcl.array]),69 the expressions P + J and J + P (where J has the value j) point to the (possibly-hypothetical) array element i+j of x if 0≤i+j≤n and the expression P - J points to the (possibly-hypothetical) array element i−j of x if 0≤i−j≤n.
It's not about exclusivity but about guaranteed non-aliasing. For example if a function takes 2 pointers of different types the compiler assumes they're not aliases of same object and apply optimizations like memcpy over memmove.
Edit: and no, char* is not special cased for the strict aliasing. Only exception is the void*
Change the example to use (u16, u8) instead of f32 and you'll get UB:
error: Undefined Behavior: using uninitialized data, but this operation requires initialized memory
--> /playground/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/fmt/num.rs:179:1
|
179 | integer! { i8, u8 }
| ^^^^^^^^^^^^^^^^^^^ using uninitialized data, but this operation requires initialized memory
|
= help: this indicates a bug in the program: it performed an invalid operation, and caused Undefined Behavior
= help: see https://doc.rust-lang.org/nightly/reference/behavior-considered-undefined.html for further information
= note: BACKTRACE:
= note: inside `core::fmt::num::<impl std::fmt::Binary for u8>::fmt` at /playground/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/fmt/num.rs:159:32: 159:37
= note: inside `core::fmt::rt::Argument::<'_>::fmt` at /playground/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/fmt/rt.rs:138:9: 138:40
= note: inside `core::fmt::run` at /playground/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/fmt/mod.rs:1162:5: 1162:19
= note: inside `std::fmt::write` at /playground/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/fmt/mod.rs:1130:26: 1130:61
= note: inside `<std::io::StdoutLock<'_> as std::io::Write>::write_fmt` at /playground/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/io/mod.rs:1763:15: 1763:43
= note: inside `<&std::io::Stdout as std::io::Write>::write_fmt` at /playground/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/io/stdio.rs:726:9: 726:36
= note: inside `<std::io::Stdout as std::io::Write>::write_fmt` at /playground/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/io/stdio.rs:700:9: 700:33
= note: inside `std::io::stdio::print_to::<std::io::Stdout>` at /playground/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/io/stdio.rs:1018:21: 1018:47
= note: inside `std::io::_print` at /playground/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/io/stdio.rs:1095:5: 1095:37
note: inside `main`
--> src/main.rs:8:7
|
8 | println!("{:08b}",*ptr.add(3));
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
= note: this error originates in the macro `int_base` which comes from the expansion of the macro `println` (in Nightly builds, run with -Z macro-backtrace for more info)
So, for a struct that is annotated with #[repr(C)](i.e. the C struct), casting the pointer to the struct to * const u8, the reading by the resulting pointer may cause UB if the pointer points to the padding bytes? Is this a correct way to avoid UB if I know the layout of the structure and skip all padding bytes(i.e. only reading these initialized fields bytes sequences)?
Yes, but it's based on different types and not exclusivity. This is basically completely unlike Rust's alias analysis. Rust doesn't assume that raw pointers of different types don't alias; only that &muts (that aren't reborrows) don't alias.
They share same motivation though have different interpretations. Many C programmers dislike strict aliasing because it seems very arbitrary. But it's mandatory for compiler to apply advanced optimization to have some level of control/assumptions at aliasing. Rust have more natural though stricter borrowing rule so we don't need such arbitrary limitation.
I may send these bytes through TCP and recover them out on another side, this is a potential usage. Anyway, low-level accessing object representations is permitted in Rust, right?
I think the alignment imposed on pointers that would be used to access the object should have more restrictions than you have given.
For example
let i = 0i32;
unsafe{
let ptr = &i as * const i32 as * const u16;
let r = *ptr;
}
The alignment of i32 is 4 while the alignment of u16 is 2, even though the address of i has a stronger alignment than that of u16, but *ptr should cause UB, anyway.
I take a reference to the strict alias of C++, I'm not sure whether it is applied to Rust
If a program attempts to access the stored value of an object through a glvalue whose type is not similar to one of the following types the behavior is undefined:
the dynamic type of the object,
a type that is the signed or unsigned type corresponding to the dynamic type of the object, or