What types have all valid bit patterns

Indeed. References already have to point to valid objects on pain of UB, so a valid reference is only ever going to point to something within whatever address space the arch gives you.

Also I don’t think that bit pattern alone can tell you whether a type is a POD or not. i.e. add a Drop impl to a newtype wrapper around an integer, and now there’s more meaning behind the type than just its bits.

Don't references also have to be aligned? At least, this would be implied by pointing to a valid instance. That means some of the LSBs must be 0 if the alignment is greater than 1.

3 Likes

Ah yeah, that would probably be true too.

  • f32/f64: yes, everything is valid, just a great deal of them are NAN
  • It’s safe and reversible to cast a usize to any (thin) raw pointer type, so yes, raw pointers can have any bit pattern
  • As others have said, references need to be non-null, aligned, and point to a valid object of the type. (Aligned needs to be mentioned separately because of ZSTs, where anywhere is a valid object in the “it can be read by ptr::read_unaligned” sense.)
  • For structs what you say is technically only true for repr(C)repr(rust) (the default) is technically allowed to include arbitrary, important extra information should the compiler deem it necessary. (Not that that actually happens today in any situation of which I’m aware.)
  • unions I suspect the answer isn’t actually finalized yet, since it depends what the rules end up being around whether the semantics are defined in terms of which variant was assigned, as just splatting bits in wouldn’t set any of the variants as active (in a official semantics sense, obviously not in a “something tracked in release code in memory” sense).
2 Likes

Partly, but also partly because I thought it was theoretically possible for optimization passes to make wierd UB happen when you violate things like this. So what happens if I do:

let val: u8 = mem::uninitialized();
println!("{}", val);

Because every possible bit of data is valid, is this not UB? Or is it still UB because the compiler assumes val is never assigned to and optimizes it away?

Pretty sure it’s still UB. Try running this program in both debug mode and release mode and you’ll see some interesting behavior:

https://play.rust-lang.org/?gist=37213939082966d9853389a2cba7a865&version=stable&mode=debug&edition=2015

Yup that’s UB. You’ll find the discussion in this recent thread relevant: How to allocate huge byte array safely

Nope.

1 Like

What are you trying to show? u8 has alignment 1 – but try u16 and you’ll see their LSB=0.

(Bringing unsafe to this kind of question is shaky, but you could just debug-print their pointers instead of using transmute.)

AFAIK, references must be aligned. Nomicon lists unaligned ptr read/writes as UB, so that would certainly carry over to references.

A “future” rustc version may decide to get clever and store data in the alignment bits.

I see, sorry I misunderstood your point about alignment. I was showing that a reference to a struct member is unaligned WRT the struct itself, but that’s kind of pointless. We’re aligned now (pun intended).

FWIW, the struct itself also only has 1-byte alignment. Aggregates are aligned to the maximum alignment of their members, unless you force it larger with #[repr(align(N))].

That makes perfect sense. :+1:

It's UB, but would it cause a segfault?

By definition, “anything” can happen. Curious why you’re asking about segfault specifically?

If the only UB is that the example prints any number, that’s different to it allowing remote code execution, for instance. But I guess that it’s best just to think of UB as a black box where anything (including the worst possible thing) can happen.

1 Like

Right, it should be considered a black box. It’s certainly interesting to see what happens in practice on a particular compiler version, but that’s mostly just intellectual curiosity.

3 Likes

Just to make it clear, both C and Rust consider undefined behavior to be anything goes. The compiler is free to make literally anything happen including, but not at all limited to, segfaults.

1 Like

Although it’s been mentioned in many other threads, I haven’t seen it in this one. Some architectures, most notably Itanium, detect and fault on [edit: read] references to uninitialized memory. The fact that all bit patterns in an initialized memory cell are valid does not imply that the same memory cell can be read when uninitialized.

6 Likes

It is best to think that. Sure, that specific toy program in the Playground might behave a particular way on a particular architecture, but what if the UB were part of a larger actively developed program? Who knows what might happen as code is added or removed or rearranged and the optimizer consequently gets a different view of the program.

And tbh even a black box approach isn’t sufficient. UB’s effects can potentially manifest anywhere in a program, not just at the site where the bad behavior is introduced.

5 Likes