Let's define a property POD (plain old data), meaning that the data is a contiguous block in memory and all bit-patterns of the data are valid.
i128 have this property.
f64 do I think, but I'm not 100% sure.
- raw pointers (
*const) I don't really have a clue. My intuition says no, since although they are conceptually unsigned ints, is it guaranteed that they are in that format? Maybe we have to treat these as opaque.
structs - I feel like these should be if all their fields are. Even if there is padding, you don't care what's in it anyway.
enums - These can't be unless you use a u8 discriminant and have exactly 256 variants, for example, so best just assume they are not.
- zero-sized types and
! - I'd say these are since they have 1, 0 representations respectively, but it's probably academic.
- other pointers - I assume these aren't since raw pointers aren't.
unions - Yes if all variants have this property
- any other types I haven't thought of (I'm just avoiding trait objects since they can't be)
I want to understand this property since it is important for writing unsafe code, and understanding the behavior of
unions. Also, it seems that the compiler can assume that invalid bit patterns never exist - meaning that you can get weird UB (that might randomly appear in a later llvm) if you do things like take references to it, even if those references are never read. Is this true? I don't feel like I really understand.
raw pointers don't have to point to a valid type instance and can be null, so i would imagine that any bit pattern works for them. references can't be null and have to point to a valid type instance, but I don't imagine there's any constraint on their bit pattern other than != 0
I guess what you're asking about is converting between data types. The nomicon covers the topic in some good detail:
Interestingly, it depends on the target architecture! To quote from the wikipedia article on x86-64:
Although virtual addresses are 64 bits wide in 64-bit mode, current implementations (and all chips known to be in the planning stages) do not allow the entire virtual address space of 264 bytes (16 EB) to be used. ... in the first implementations of the architecture, only the least significant 48 bits of a virtual address would actually be used in address translation (page table lookup).
Canonical form addresses run from 0 through 00007FFF'FFFFFFFF, and from FFFF8000'00000000 through FFFFFFFF'FFFFFFFF
This means that there is a range of values for which a reference is known to be invalid on x86-64 (at least in current implementations).
Yeah, I thought about bringing up the x64 thing but that's more of a side-effect of the way the architecture decides to treat pointers, isn't it?. I imagine Rust itself would be perfectly happy to handle pointers and references with those bit patterns, but it so happens that the architecture will never give it any to play with (modulo pointer tagging shenanigans I guess).
One could also imagine a compiler feature that has knowledge of this particular architecture implementation detail, and protects against compiling code that violates it. But thinking about it, I'm not sure it would be any more useful than the current
unsafe keyword, (which is required for touching raw pointers). After all, it should not be possible to create references to the non-canonical address range in safe code.
Indeed. References already have to point to valid objects on pain of UB, so a valid reference is only ever going to point to something within whatever address space the arch gives you.
Also I don't think that bit pattern alone can tell you whether a type is a POD or not. i.e. add a
Drop impl to a newtype wrapper around an integer, and now there's more meaning behind the type than just its bits.
Don't references also have to be aligned? At least, this would be implied by pointing to a valid instance. That means some of the LSBs must be 0 if the alignment is greater than 1.
Ah yeah, that would probably be true too.
Partly, but also partly because I thought it was theoretically possible for optimization passes to make wierd UB happen when you violate things like this. So what happens if I do:
let val: u8 = mem::uninitialized();
Because every possible bit of data is valid, is this not UB? Or is it still UB because the compiler assumes
val is never assigned to and optimizes it away?
Pretty sure it's still UB. Try running this program in both debug mode and release mode and you'll see some interesting behavior:
Yup that’s UB. You’ll find the discussion in this recent thread relevant: How to allocate huge byte array safely - #42 by scottmcm
What are you trying to show?
u8 has alignment 1 -- but try
u16 and you'll see their LSB=0.
unsafe to this kind of question is shaky, but you could just debug-print their pointers instead of using
AFAIK, references must be aligned. Nomicon lists unaligned ptr read/writes as UB, so that would certainly carry over to references.
A “future” rustc version may decide to get clever and store data in the alignment bits.
I see, sorry I misunderstood your point about alignment. I was showing that a reference to a struct member is unaligned WRT the struct itself, but that's kind of pointless. We're aligned now (pun intended).
FWIW, the struct itself also only has 1-byte alignment. Aggregates are aligned to the maximum alignment of their members, unless you force it larger with
That makes perfect sense.
It's UB, but would it cause a segfault?
By definition, “anything” can happen. Curious why you’re asking about segfault specifically?