The following code gives an error in Miri (playground):
#[repr(C)]
struct Foo {
x: u8,
}
#[repr(C)]
struct Bar {
x: u8,
y: u8,
}
fn main() {
unsafe {
let data = &[10, 20];
let foo_ref: &Foo = &*(data.as_ptr() as *const Foo);
let bar_ref: &Bar = &*(foo_ref as *const Foo as *const Bar);
println!("bar_ref.y: {}", bar_ref.y);
}
}
Output:
Compiling playground v0.0.1 (/playground)
Finished dev [unoptimized + debuginfo] target(s) in 0.63s
Running `/playground/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/bin/cargo-miri runner target/miri/x86_64-unknown-linux-gnu/debug/playground`
error: Undefined Behavior: trying to retag from <2675> for SharedReadOnly permission at alloc3[0x1], but that tag does not exist in the borrow stack for this location
--> src/main.rs:17:29
|
17 | let bar_ref: &Bar = &*(foo_ref as *const Foo as *const Bar);
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
| |
| trying to retag from <2675> for SharedReadOnly permission at alloc3[0x1], but that tag does not exist in the borrow stack for this location
| this error occurs as part of retag at alloc3[0x0..0x2]
|
= help: this indicates a potential bug in the program: it performed an invalid operation, but the Stacked Borrows rules it violated are still experimental
= help: see https://github.com/rust-lang/unsafe-code-guidelines/blob/master/wip/stacked-borrows.md for further information
help: <2675> was created by a SharedReadOnly retag at offsets [0x0..0x1]
--> src/main.rs:17:32
|
17 | let bar_ref: &Bar = &*(foo_ref as *const Foo as *const Bar);
| ^^^^^^^
= note: BACKTRACE (of the first span):
= note: inside `main` at src/main.rs:17:29: 17:68
note: some details are omitted, run with `MIRIFLAGS=-Zmiri-backtrace=full` for a verbose backtrace
error: aborting due to previous error
Is this a false positive from Miri or is this code actually unsound? Doesn't the entirety of bar_ref point to valid memory?
AFAIK once you cast to a reference (that is, the &* when you create foo_ref) you lose access to anything outside the range that a &Foo is allowed to access. Use only raw pointers if you want to do these kinds of casts, and avoid mixing raw pointers and references if you can.
Thanks, that's very unfortunate. I was also hoping to use things like Foo/Bar as trait objects which seems to require the use of references, ie. I don't think you can actually call any methods on *const dyn Trait without casting to a reference and running into this issue. Specifically Foo would be like a header containing a length, then you could recast as larger structs like Bar or Baz once you know the length.
Is there any way to use pointers as trait objects without another level of indirection/allocation? Or another solution that would work here instead?
Miri doesn't have false positives. If it flags your code, it's unsound. (Disclaimer: I know some of the rules are not fully settled. It's better to be conservative.)
You can't just cast pointers to arbitrary types and reinterpret the referent. Things like alignment and provenance are a big part of the memory model of the language. A pointer and a reference is not just an index into a huge array of bytes; that mental model of how this language treats memory is too simplistic. (Ie., Rust doesn't "compile to the hardware", you need to obey the abstract rules of the language at all times, and any hardware-based arguments are invalid and/or irrelevant when discussing the semantics.)
One thing you can do is add an unsized field to the end of Foo to represent the trailing data:
#![feature(ptr_metadata)]
#[repr(C)]
struct Foo {
x: u8,
data: [Bar]
}
#[repr(C)]
struct Bar {
x: u8,
y: u8,
}
fn main() {
let data: &'static [u8] = &[10, 20, 30];
let foo_ref: &Foo = unsafe {
// On stable, you have to do this with something like `transmute`
// instead, and I don't know the right way to make it happen...
&*(std::ptr::from_raw_parts(data.as_ptr() as *const (), 1))
};
let bar_ref: &Bar = &foo_ref.data[0];
println!("bar_ref.y: {}", bar_ref.y);
}
Or, if you want to read the number of items out of the header:
Thanks to everyone who responded, all of your replies have been helpful! Now I have just a few questions:
As I understand it, the problem with my original code was: if you cast from a reference to a pointer, you can only access memory that was within the bounds of that reference. Is this correct? This would also mean you can't access any padding bytes, correct?
Assuming proper size and alignment, is it ok to use pointers to reinterpret memory between &[u8] and a properly-aligned #[repr(C)] struct? What about between different #[repr(C)] structs? I'm aware that some of the rules are currently unsettled; thus, I'm wondering what are the specific restrictions and guarantees at this time? The reference and nomicon are pretty vague. This kind of type punning is usually UB in C++ without -fno-strict-aliasing, but from what I can tell, it should be ok in Rust?
Is it safe to create a reference to a #[repr(C, packed)] struct, without worrying about alignment? I know that it's unsafe to reference any fields, but what about creating a reference to the struct itself?
I appreciate the responses about trait shadowing and using entirely raw pointers; I am wondering if there is a safe approach, which doesn't require extra indirection, to recast memory and still support trait objects in stable Rust. For example, is this safe?:
use std::ptr::{read_unaligned, addr_of};
use std::fmt;
#[repr(C, packed)]
struct Header {
struct_type: u8,
}
#[repr(C, packed)]
struct StructType1 {
struct_type: u8,
foo: u16,
// ...
}
#[repr(C, packed)]
struct StructType2 {
struct_type: u8,
bar: i64,
// ...
}
impl fmt::Debug for StructType1 {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
unsafe {
let ptr = self as *const StructType1;
f.debug_struct("StructType1")
.field("struct_type", &read_unaligned(addr_of!((*ptr).struct_type)))
.field("foo", &read_unaligned(addr_of!((*ptr).foo)))
.finish()
}
}
}
unsafe fn to_dyn_debug<'a>(buf: &'a [u8]) -> &'a dyn fmt::Debug {
// assume `buf` is very large
let buf_ptr = buf.as_ptr();
let header_ptr = buf_ptr as *const Header;
let struct_type = read_unaligned(addr_of!((*header_ptr).struct_type));
if struct_type == 1 {
&*(buf_ptr as *const StructType1 as *const dyn fmt::Debug)
} else {
// ...
}
}
Are there any problems with either the recasting, or aliasing with multiple types at once?
The bounds statement is correct. You could write over the padding bytes, but you can't read padding bytes as they may be uninitialized memory. To read bytes which were padding in some type or another, you would have to be 100% sure they were written to first, e.g. by using MaybeUninit first to write the bytes first. (It doesn't matter if the rest of the type was initialized, as a valid way to write a type with padding is to write only the non-padding parts.)
It's possible with enough work though it's much nicer when bytemuck applies. I don't know anywhere that summarizes all the restrictions and guarantees. Aliasing is it's own topic on top of what's been discussed here; in extreme brevity
don't create a &mut unless nothing can read or write the same memory for the duration (it promises exclusivity)
don't create a & unless nothing can write the same memory
unless you use UnsafeCell directly or indirectly in some sound way
Rust has explicitly chosen to not have a way to globally disable the aliasing requirements of references such as -fno-strict-aliasing.
Yes. (If you can do it in safe code, it's safe.)
No, it's pulling the vtable pointer out of nowhere. You'll should create a reference to the concrete type, and then type-erase that (let the compiler coerce it).
I believe this goes for all kinds of indirection β all pointer-like types have provenance and alignment (although raw pointers are allowed to be unaligned, but strictly only as long as you don't dereference them). And especially if you are actually planning on dereferencing said pointer-like object. I.e., you aren't allowed to do this in the other direction (reference -> pointer) or in other conceivable combinations, either.
Right. Reading padding bytes is also UB.
I don't think there can ever be one in the general case. Circumventing the type system is unsafe, pretty much by definition.
A limited subset of "plain old data" types, where certain constraints (e.g. no padding, no niche representations, etc.) are met, can be made work, but not without unsafety, only by encapsulating it. The bytemuck crate is a famous attempt at formalizing such relationships.
Ok, that makes sense! The one thing I'm unsure about is "pulling the vtable pointer out of nowhere"; that was the one thing I thought should actually be safe. I checked and you can't write e.g. struct NoDebug; &NoDebug as *const NoDebug as *const dyn Debug which suggests Rust should be able to infer the vtable from the pointee type. If this type of cast is safe, I'd prefer to use it since returning &dyn Debug instead of *const Debug means you lose access to memory outside the bounds of the reference (as shown by Miri in my original post; I guess my newer example should also return *const dyn Debug instead of &dyn Debug).
This is exactly what I'm aiming for, plain-old-data structs with #[repr(C, packed)] and creating a safe wrapper that reads fields via unsafe { read_unaligned(addr_of!((*ptr).field)) }. What I'm also trying to gather is, is it ok to simultaneously have &Header, &StructType1, and &dyn Debug all simultaneously pointing to an instance of e.g. vec![1u8, 2u8, 3u8, ...].as_slice().as_ptr()? Not sure if it would violate any TBAA rules, I could only find vague guarantees in the Rust docs.
Rust doesn't have TBAA so you're fine with shared references to both. Think of if this way: You could have made the struct_type field Header and passed a &Header and a &StructType just like passing &self and &self.field.
With entirely safe code, you can create all of a &Header, &StructType1, and &dyn Debug simultaneously from a &TotallySafe. This is a good indication there's a sound way to do it with your unsafe use case.
I should clarify, I'm asking about whether it's ok for them all to point to the same location simultaneously. In your example you still can't create &Header and &StructType1 pointing to the same location. I am more convinced it's safe but still not certain. In general, wondering if it's ok to have two different struct references to the same location, excluding the case where one type contains the other.
Rust does not have TBAA. But in C, one is allowed to cast a pointer-to-struct to a pointer-to-prefix if the types (and field names!) match, although this is pretty much a special case (along with the unsigned char * exception). The field name restriction is AFAIK specific to C (and rather silly, but oh well, 40-year-old languages).
I believe having references to a struct and its first field is fine. I am not sure about how strict provenance is in this case, though. I.e., the following may be OK:
let ptr: *const Header = &whole as *const Whole as *const Header;
let whole: *const Whole = ptr as *const Whole;
while this is potentially not OK, because the pointer only points to the first field, and not the whole object:
let header: *const Header = &whole.first_field;
let whole: *const Whole = header as *const Whole;
I was partly worried because in C++ there's an exception letting you cast from struct Foo * to char * but NOT vice versa, and in general you also can't cast from struct Foo * to struct Bar *, without -fno-strict-aliasing. But I am fairly certain now that this is not the case for Rust, casting to any pointer type (even transitively) then converting to a reference seems to be fully allowed as long as the final reference points to valid, aligned data (even if there is aliasing with disjoint types). Just want to make sure that is correct, especially since I don't know much about provenance rules.
I think a very narrow example would be:
use std::fmt::Debug;
#[repr(C)]
#[derive(Debug)]
struct Foo {
x: u8,
}
#[repr(C)]
#[derive(Debug)]
struct Bar {
x: u8,
y: u8,
}
#[repr(C)]
#[derive(Debug)]
struct Baz {
x: u8,
y: u8,
z: u8,
}
unsafe fn test<'a>(buf: &'a [u8]) {
let buf_ptr: *const u8 = buf.as_ptr();
let foo_ref: &'a Foo = &*(buf_ptr as *const Foo);
let bar_ref: &'a Bar = &*(buf_ptr as *const Bar);
let dyn_ref: &'a dyn Debug = &*(buf_ptr as *const Bar as *const dyn Debug);
// especially this one:
let baz_ref: &'a Baz = &*(buf_ptr as *const Bar as *const dyn Debug as *const Baz);
println!("{:?} {:?} {:?} {:?}", foo_ref, bar_ref, dyn_ref, baz_ref);
}
fn main() {
unsafe { test(&[1, 2, 3]) };
}
Miri seems to think this is OK; if so, then I think as long as I only do pointer-to-pointer casts then pointer-to-reference casts (never reference-to-reference or reference-to-pointer), everything should be ok. That's really the bottom line.