Miri rejects casting to larger size?

Coder-256 · June 18, 2023, 9:54pm

The following code gives an error in Miri (playground):

#[repr(C)]
struct Foo {
    x: u8,
}

#[repr(C)]
struct Bar {
    x: u8,
    y: u8,
}


fn main() {
    unsafe {
        let data = &[10, 20];
        let foo_ref: &Foo = &*(data.as_ptr() as *const Foo);
        let bar_ref: &Bar = &*(foo_ref as *const Foo as *const Bar);
        println!("bar_ref.y: {}", bar_ref.y);
    }
}

Output:

   Compiling playground v0.0.1 (/playground)
    Finished dev [unoptimized + debuginfo] target(s) in 0.63s
     Running `/playground/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/bin/cargo-miri runner target/miri/x86_64-unknown-linux-gnu/debug/playground`
error: Undefined Behavior: trying to retag from <2675> for SharedReadOnly permission at alloc3[0x1], but that tag does not exist in the borrow stack for this location
  --> src/main.rs:17:29
   |
17 |         let bar_ref: &Bar = &*(foo_ref as *const Foo as *const Bar);
   |                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
   |                             |
   |                             trying to retag from <2675> for SharedReadOnly permission at alloc3[0x1], but that tag does not exist in the borrow stack for this location
   |                             this error occurs as part of retag at alloc3[0x0..0x2]
   |
   = help: this indicates a potential bug in the program: it performed an invalid operation, but the Stacked Borrows rules it violated are still experimental
   = help: see https://github.com/rust-lang/unsafe-code-guidelines/blob/master/wip/stacked-borrows.md for further information
help: <2675> was created by a SharedReadOnly retag at offsets [0x0..0x1]
  --> src/main.rs:17:32
   |
17 |         let bar_ref: &Bar = &*(foo_ref as *const Foo as *const Bar);
   |                                ^^^^^^^
   = note: BACKTRACE (of the first span):
   = note: inside `main` at src/main.rs:17:29: 17:68

note: some details are omitted, run with `MIRIFLAGS=-Zmiri-backtrace=full` for a verbose backtrace

error: aborting due to previous error

Is this a false positive from Miri or is this code actually unsound? Doesn't the entirety of bar_ref point to valid memory?

SkiFire13 · June 18, 2023, 10:13pm

AFAIK once you cast to a reference (that is, the &* when you create foo_ref) you lose access to anything outside the range that a &Foo is allowed to access. Use only raw pointers if you want to do these kinds of casts, and avoid mixing raw pointers and references if you can.

Coder-256 · June 18, 2023, 10:20pm

Thanks, that's very unfortunate. I was also hoping to use things like Foo/Bar as trait objects which seems to require the use of references, ie. I don't think you can actually call any methods on *const dyn Trait without casting to a reference and running into this issue. Specifically Foo would be like a header containing a length, then you could recast as larger structs like Bar or Baz once you know the length.

Is there any way to use pointers as trait objects without another level of indirection/allocation? Or another solution that would work here instead?

quinedot · June 19, 2023, 12:11am

*const dyn Trait and &const dyn Trait are the same size so I don't understand what you really mean here.

You could read the size through a *const Foo.

I don't think I actually understand what you want, but you can call methods on *const dyn Trait with enough effort.

H2CO3 · June 19, 2023, 4:41am

Miri doesn't have false positives. If it flags your code, it's unsound. (Disclaimer: I know some of the rules are not fully settled. It's better to be conservative.)

You can't just cast pointers to arbitrary types and reinterpret the referent. Things like alignment and provenance are a big part of the memory model of the language. A pointer and a reference is not just an index into a huge array of bytes; that mental model of how this language treats memory is too simplistic. (Ie., Rust doesn't "compile to the hardware", you need to obey the abstract rules of the language at all times, and any hardware-based arguments are invalid and/or irrelevant when discussing the semantics.)

CAD97 · June 19, 2023, 5:22am

If you want some more context, see

github.com/rust-lang/unsafe-code-guidelines

Storing an object as &Header, but reading the data past the end of the header

opened 04:26AM - 11 Nov 20 UTC

thomcc

C-open-question A-provenance

This is related to https://github.com/rust-lang/unsafe-code-guidelines/issues/2 …but the read is not out of bounds of the allocation, not being written to by other threads, not the bytes of a `&mut Blah`, etc. That is to say, really the code is trying to model a dynamically sized type, that for one reason or another does not support (Note that ther are a number of custom DST proposals). So, I heard that it was UB for you to have a &T and read outside the bounds of that T, even if conceptually it's a totally in-bounds read. E.g. `T` here may be a ZST, or it may be a header after which a trailing array is expected, or standing that sits at the head of a trailing array, or it may be a struct that's the common shared fields of some set of other struct... These are pretty common in unsafe code as it's a pattern which is both legal and useful in C and C++. It's pretty common in Rust too: - It's not unheard of in C apis to use a `#[repr(C)] struct Foo { _priv: [u8; 0] }`, as this is what bindgen uses. Some of these APIs then go on use `&Foo` in the Rust code. (This is essentially a workaround for a lack of a stable `extern Type`). This code doesn't read the data, so the only issue would be if we told LLVM it could assume things about the pointer that turn out to be untrue in a situation like cross-lang LTO, probably. - Similarly, I've seen other FFI code that used a `struct CStr([u8; 0])` for a similar purpose — as a version of `std::ffi::CStr` that you can actually pass to C directly. (I even almost did this for [ffi_support::FfiStr](https://docs.rs/ffi-support/0.4.2/ffi_support/struct.FfiStr.html), but went with a pointer inside so I could easily check for code passing in null). - `bitvec` has a `BitSlice` type which acts a lot like a slice that magically has bit-level indexing. Internally it's something like `struct BitSlice { _mem: [()] }` which lets it behave like an unsized type, The "pointer" and length are both specially encoded values that contain both the actual pointer/length as well as bit-level offsets for tracking where withing byte things are. There are a lot of reasons this might be illegal, but I had not thought `mem::size_of_val` returning the wrong value was the actual one. - `anyhow::Error` internally wraps a `Box<ErrorImpl<()>>`, where `ErrorImpl<T>` contains a vtable, a backtrace, and then the `T`. `ErrorImpl<()>` is used as it behaves as the "common header" for real ErrorImpl values. On construction, `Box<ErrorImpl<T>>` is converted to `Box<ErrorImpl<()>>`, when stored in the Error. Whenever a method is called that needs to delegate to the vtable, the `Box<ErrorImpl<()>>` is converted into the right pointer type for the vtable function (one of `&ErrorImpl<()>`, `&mut ErrorImpl<()>`, `Box<ErrorImpl<()>>`) which is called with that pointer. The first thing the vtable function generally does is convert the reference to e.g. `&ErrorImpl<T>`, example: https://github.com/dtolnay/anyhow/blob/99c982128458fecb8d1d7aff9478dd77dac0ee3b/src/error.rs#L538-L545. (I had always kind of thought it wasn't okay to use `Box<T>` here, but I'm surprised that stuff like `&ErrorImpl<()>` to `&ErrorImpl<RealType>` isn't okay either). - `wio-rs` contains `VariableSizedBox` which provides this pattern in a library form, and IIUC is mostly intended for the flexible-array-member case. The API attempts to launder pointers to the object, which is... very non-obvious. It seems like it plausibly avoids the issue here, though, but it's insanely subtle, and if this is the recommended pattern, I suspect it will need a very good nomicon entry. https://github.com/retep998/wio-rs/blob/9bf021178b2d02485f1bd35e6cff41bf52d4a9a2/src/vsb.rs#L98-L113 - I do [something similar](https://github.com/thomcc/arcstr/blob/main/src/arc_str.rs#L725-L728) in `arcstr`, where there's a header and a variable length segment that trails it. I avoided issues here by luck, as I took great care to avoid ever putting the inner type behind a reference. This was lucky since I wasn't aware of this at all, and did it for other reasons. This was painful as it required field hard-coding offsets. - This isn't to say anything of the numerous C or C++ apis which expose polymorphism in this way — In c++ this is how single non-virtual inheritance is represented, so it's especially common, although it was common in C too. Additionally, C code with a flexible array member is in tons of places, and not just windows APIs. This is just a few off the top — there's a lot of unsafe code that does this. Personally, I had thought it was allowed so long as you don't go past the actual bounds of the allocation, it makes *some* sense that it's not though, unfortunately. (Somehow, I don't think I've ever had miri trouble me about it, but it's seeming like it's just because of luck && coincidence more than anything else). Anyway, I think if this is UB we should start being way more vocal about it, because it's a totally legal pattern in C and C++, and common.

and things crosslinked from there.

alice · June 19, 2023, 7:16am

Tokio works like this, and avoids these issues by staying entirely in raw pointer land.

2e71828 · June 19, 2023, 7:30am

One thing you can do is add an unsized field to the end of Foo to represent the trailing data:

#![feature(ptr_metadata)]

#[repr(C)]
struct Foo {
    x: u8,
    data: [Bar]
}

#[repr(C)]
struct Bar {
    x: u8,
    y: u8,
}


fn main() {
    let data: &'static [u8] = &[10, 20, 30];
    let foo_ref: &Foo = unsafe {
        // On stable, you have to do this with something like `transmute`
        // instead, and I don't know the right way to make it happen...
        &*(std::ptr::from_raw_parts(data.as_ptr() as *const (), 1))
    };
    let bar_ref: &Bar = &foo_ref.data[0];
    println!("bar_ref.y: {}", bar_ref.y);
}

Or, if you want to read the number of items out of the header:

#[repr(C, align(1))]
struct Foo {
    head: Header,
    data: [Bar]
}

#[repr(C, align(1))]
struct Header {
    x: u8,
    bar_count: u8
}

#[repr(C, align(1))]
struct Bar {
    x: u8,
    y: u8,
}

use std::mem::size_of;

impl Foo {
    pub fn from_bytes(addr: &[u8])->&Self {
        assert!(addr.len() >= size_of::<Header>());
        let header:&Header = unsafe { &*(addr.as_ptr() as *const Header) };
        let bar_count = header.bar_count as usize;
        
        assert!(addr.len() >= size_of::<Header>() + bar_count * size_of::<Bar>());
        unsafe { &*(std::ptr::from_raw_parts(addr.as_ptr() as *const (), bar_count)) }
    }
}

Coder-256 · June 19, 2023, 5:32pm

Thanks to everyone who responded, all of your replies have been helpful! Now I have just a few questions:

As I understand it, the problem with my original code was: if you cast from a reference to a pointer, you can only access memory that was within the bounds of that reference. Is this correct? This would also mean you can't access any padding bytes, correct?
Assuming proper size and alignment, is it ok to use pointers to reinterpret memory between &[u8] and a properly-aligned #[repr(C)] struct? What about between different #[repr(C)] structs? I'm aware that some of the rules are currently unsettled; thus, I'm wondering what are the specific restrictions and guarantees at this time? The reference and nomicon are pretty vague. This kind of type punning is usually UB in C++ without -fno-strict-aliasing, but from what I can tell, it should be ok in Rust?
Is it safe to create a reference to a #[repr(C, packed)] struct, without worrying about alignment? I know that it's unsafe to reference any fields, but what about creating a reference to the struct itself?

I appreciate the responses about trait shadowing and using entirely raw pointers; I am wondering if there is a safe approach, which doesn't require extra indirection, to recast memory and still support trait objects in stable Rust. For example, is this safe?:

use std::ptr::{read_unaligned, addr_of};
use std::fmt;

#[repr(C, packed)]
struct Header {
    struct_type: u8,
}

#[repr(C, packed)]
struct StructType1 {
    struct_type: u8,
    foo: u16,
    // ...
}

#[repr(C, packed)]
struct StructType2 {
    struct_type: u8,
    bar: i64,
    // ...
}

impl fmt::Debug for StructType1 {
    fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
        unsafe {
            let ptr = self as *const StructType1;
            f.debug_struct("StructType1")
                .field("struct_type", &read_unaligned(addr_of!((*ptr).struct_type)))
                .field("foo", &read_unaligned(addr_of!((*ptr).foo)))
                .finish()
        }
    }
}

unsafe fn to_dyn_debug<'a>(buf: &'a [u8]) -> &'a dyn fmt::Debug {
    // assume `buf` is very large
    let buf_ptr = buf.as_ptr();
    let header_ptr = buf_ptr as *const Header;
    let struct_type = read_unaligned(addr_of!((*header_ptr).struct_type));
    if struct_type == 1 {
        &*(buf_ptr as *const StructType1 as *const dyn fmt::Debug)
    } else {
        // ...
    }
}

Are there any problems with either the recasting, or aliasing with multiple types at once?

quinedot · June 19, 2023, 6:56pm

The bounds statement is correct. You could write over the padding bytes, but you can't read padding bytes as they may be uninitialized memory. To read bytes which were padding in some type or another, you would have to be 100% sure they were written to first, e.g. by using MaybeUninit first to write the bytes first. (It doesn't matter if the rest of the type was initialized, as a valid way to write a type with padding is to write only the non-padding parts.)

It's possible with enough work though it's much nicer when bytemuck applies. I don't know anywhere that summarizes all the restrictions and guarantees. Aliasing is it's own topic on top of what's been discussed here; in extreme brevity

don't create a &mut unless nothing can read or write the same memory for the duration (it promises exclusivity)
don't create a & unless nothing can write the same memory
- unless you use UnsafeCell directly or indirectly in some sound way

Rust has explicitly chosen to not have a way to globally disable the aliasing requirements of references such as -fno-strict-aliasing.

Yes. (If you can do it in safe code, it's safe.)

No, it's pulling the vtable pointer out of nowhere. You'll should create a reference to the concrete type, and then type-erase that (let the compiler coerce it).

Here's what I came up with. Instead of as_bytes I could have used bytemuck probably.

H2CO3 · June 19, 2023, 7:12pm

I believe this goes for all kinds of indirection – all pointer-like types have provenance and alignment (although raw pointers are allowed to be unaligned, but strictly only as long as you don't dereference them). And especially if you are actually planning on dereferencing said pointer-like object. I.e., you aren't allowed to do this in the other direction (reference -> pointer) or in other conceivable combinations, either.

Right. Reading padding bytes is also UB.

I don't think there can ever be one in the general case. Circumventing the type system is unsafe, pretty much by definition.

A limited subset of "plain old data" types, where certain constraints (e.g. no padding, no niche representations, etc.) are met, can be made work, but not without unsafety, only by encapsulating it. The bytemuck crate is a famous attempt at formalizing such relationships.

Coder-256 · June 19, 2023, 7:17pm

Ok, that makes sense! The one thing I'm unsure about is "pulling the vtable pointer out of nowhere"; that was the one thing I thought should actually be safe. I checked and you can't write e.g. struct NoDebug; &NoDebug as *const NoDebug as *const dyn Debug which suggests Rust should be able to infer the vtable from the pointee type. If this type of cast is safe, I'd prefer to use it since returning &dyn Debug instead of *const Debug means you lose access to memory outside the bounds of the reference (as shown by Miri in my original post; I guess my newer example should also return *const dyn Debug instead of &dyn Debug).

quinedot · June 19, 2023, 7:21pm

I think you're correct and it wasn't actually coming out of nowhere.

Coder-256 · June 19, 2023, 7:21pm

This is exactly what I'm aiming for, plain-old-data structs with #[repr(C, packed)] and creating a safe wrapper that reads fields via unsafe { read_unaligned(addr_of!((*ptr).field)) }. What I'm also trying to gather is, is it ok to simultaneously have &Header, &StructType1, and &dyn Debug all simultaneously pointing to an instance of e.g. vec![1u8, 2u8, 3u8, ...].as_slice().as_ptr()? Not sure if it would violate any TBAA rules, I could only find vague guarantees in the Rust docs.

quinedot · June 19, 2023, 7:28pm

Rust doesn't have TBAA so you're fine with shared references to both. Think of if this way: You could have made the struct_type field Header and passed a &Header and a &StructType just like passing &self and &self.field.

Maybe play around with things like this:

struct TotallySafe {
    header: Header,
    data: Data,
}
enum Data {
    S1(StructType1),
    S2(StructType2),
}

With entirely safe code, you can create all of a &Header, &StructType1, and &dyn Debug simultaneously from a &TotallySafe. This is a good indication there's a sound way to do it with your unsafe use case.

Coder-256 · June 19, 2023, 7:33pm

I should clarify, I'm asking about whether it's ok for them all to point to the same location simultaneously. In your example you still can't create &Header and &StructType1 pointing to the same location. I am more convinced it's safe but still not certain. In general, wondering if it's ok to have two different struct references to the same location, excluding the case where one type contains the other.

H2CO3 · June 19, 2023, 7:36pm

Rust does not have TBAA. But in C, one is allowed to cast a pointer-to-struct to a pointer-to-prefix if the types (and field names!) match, although this is pretty much a special case (along with the unsigned char * exception). The field name restriction is AFAIK specific to C (and rather silly, but oh well, 40-year-old languages).

I believe having references to a struct and its first field is fine. I am not sure about how strict provenance is in this case, though. I.e., the following may be OK:

let ptr: *const Header = &whole as *const Whole as *const Header;
let whole: *const Whole = ptr as *const Whole;

while this is potentially not OK, because the pointer only points to the first field, and not the whole object:

let header: *const Header = &whole.first_field;
let whole: *const Whole = header as *const Whole;

So better be conservative here.

quinedot · June 19, 2023, 7:36pm

You can. Shared references are allowed to overlap even if there's no type-level "containment".

2e71828 · June 19, 2023, 7:44pm

There’s also

#[derive(Debug)]
struct TotallySafe<Data:?Sized> {
    header: Header,
    data: Data,
}

#[derive(Debug)];
struct Payload1(…);

Then, if you have an &TotallySafe<Payload1>, you can coerce it to either &TotallySafe<dyn Debug> or &dyn Debug.

Coder-256 · June 19, 2023, 7:47pm

I was partly worried because in C++ there's an exception letting you cast from struct Foo * to char * but NOT vice versa, and in general you also can't cast from struct Foo * to struct Bar *, without -fno-strict-aliasing. But I am fairly certain now that this is not the case for Rust, casting to any pointer type (even transitively) then converting to a reference seems to be fully allowed as long as the final reference points to valid, aligned data (even if there is aliasing with disjoint types). Just want to make sure that is correct, especially since I don't know much about provenance rules.

I think a very narrow example would be:

use std::fmt::Debug;

#[repr(C)]
#[derive(Debug)]
struct Foo {
    x: u8,
}

#[repr(C)]
#[derive(Debug)]
struct Bar {
    x: u8,
    y: u8,
}

#[repr(C)]
#[derive(Debug)]
struct Baz {
    x: u8,
    y: u8,
    z: u8,
}

unsafe fn test<'a>(buf: &'a [u8]) {
    let buf_ptr: *const u8 = buf.as_ptr();
    let foo_ref: &'a Foo = &*(buf_ptr as *const Foo);
    let bar_ref: &'a Bar = &*(buf_ptr as *const Bar);
    let dyn_ref: &'a dyn Debug = &*(buf_ptr as *const Bar as *const dyn Debug);
    // especially this one:
    let baz_ref: &'a Baz = &*(buf_ptr as *const Bar as *const dyn Debug as *const Baz);
    println!("{:?} {:?} {:?} {:?}", foo_ref, bar_ref, dyn_ref, baz_ref);
}

fn main() {
    unsafe { test(&[1, 2, 3]) };
}

Miri seems to think this is OK; if so, then I think as long as I only do pointer-to-pointer casts then pointer-to-reference casts (never reference-to-reference or reference-to-pointer), everything should be ok. That's really the bottom line.

Topic		Replies	Views
Safety of casting from `mut T` to `mut ()` to `*mut T` (+dynamic linking?) help	16	722	October 2, 2024
[FFI] Casting C void* to Rust structure (erratum)	15	6338	January 24, 2021
Re-interpreting a Repr-C'd struct's bytes as a certain type	41	1589	November 8, 2019
Pointer provenance help	13	3099	June 1, 2020
How do you make miri understand joined allocations?	8	551	October 23, 2023

Miri rejects casting to larger size?

Related topics