Pointer provenance

What is the situation on pointer provenance? There were a few threads a few weeks ago, and this is clearly something miri takes seriously: It will complain on e.g. converting a reference to the first field of a #[repr(C)] struct to a reference of the full struct.

Miri example of repr(C) struct
#[repr(C)]
#[derive(Debug)]
struct Full {
    a: A,
    b: B,
}

#[repr(C)]
#[derive(Debug)]
struct A {
    field: u32
}

#[repr(C)]
#[derive(Debug)]
struct B {
    field: u32
}

fn convert(r: &A) -> &Full {
    unsafe { &*(r as *const A as *const Full) }
}

fn main() {
    let full = Full {
        a: A { field: 10 },
        b: B { field: 20 },
    };
    
    let a = &full.a;
    
    println!("{:?}", convert(a));
}

Pick miri under tools in playground to see it fail. Are there any authoritative documents on this? I came across this, which says

The exact form of provenance in Rust is unclear. [...] In the following, we give some examples of what provenance could look like.

Considering miri takes it so seriously, it seems quite important to understand. Here's my current understanding:

  1. References always have exactly the provenance corresponding to the size of the referent.
  2. Raw pointers can have a larger provenance, e.g. slice::as_mut_ptr has provenance of the entire slice, not just the first element.
  3. Converting a reference to a raw pointer gives the raw pointer the provenance of the reference.
  4. Raw pointers obtained from the allocator have provenance of the allocation.

So provenance is lost on conversion to reference. This is supposedly quite important for the soundness of IterMut, which is implemented as a start and end raw pointer both with provenance of the entire slice, but the returned mutable references do not alias due to their provenance being disjoint.

Of course this raises a question: It seems like the raw pointers in IterMut alias with the returned mutable reference? Is this ok merely because they are no longer used to access those parts of the slice? That appears to be a special interaction between aliasing and raw pointers.

But all of this is what I've gathered from random forum posts, and I'd like something more reliable. Can someone clarify what's going on? The nomicon doesn't appear to have anything to say about this.

2 Likes

The code for incrementing the pointer is here:

unsafe fn post_inc_start(&mut self, offset: isize) -> * $raw_mut T {
    if mem::size_of::<T>() == 0 {
        zst_shrink!(self, offset);
        self.ptr
    } else {
        let old = self.ptr;
        //Note
        self.ptr = self.ptr.offset(offset);
        old
    }
}

Where I have inserted the note is the only place where the pointers can alias, unless it's a ZST, where I'm unfamiliar with the rules surrounding aliaising, but my assumption is that in theory it's okay since there is no state that you can screw up.


Also, I suspect that MIRI is getting mad at you since you could write a similar function which is UB:

fn convert(r: &mut A) -> &mut Full {
    unsafe { &mut *(r as *mut A as *mut Full) }
}

fn main() {
    let full = Full {
        a: A { field: 10 },
        b: B { field: 20 },
    };
    
    let a = &mut full.a;
    let full_2 = convert(a);
    let b = &mut full.b;
    if &full_2.b as usize == b as usize {
        println!("This is expected");
    } else if &mut full_2.b as usize == b as usize {
        println!("This is also expected");
    } else {
        println!("This is what might be produced");
    }
}

In this case the compiler might expect &[mut]full_2.b will never alias with b since there cannot be more than one alias to b.


Similarly, if we keep your original example, I could run the following code:

fn convert(r: &A) -> &Full {
    unsafe { &*(r as *const A as *const Full) }
}
 
fn main() {
    let mut full = Full {
        a: A { field: 10 },
        b: B { field: 20 },
    };
    
    let a = &full.a;
    let b = &mut full.b;
    let full_2 = convert(a);
    let b_2 = &full_2.b; //Uh oh!
}

Which would cause UB without changing your original function.


Ack! I realized after posting that I never answered your original question:

I unfortunately have no idea, I wasn't even aware this existed prior to reading your question. I am simply able to show why this is UB.

cc @RalfJung

1 Like

The simple answer is that you, the developer, have to follow the strictest possible interpretation. The slightly longer answer is that because we aren't exactly sure the exact memory model Rust/rustc should be using, the only way you can be absolutely sure your code isn't UB is to code against the strictest interpretation until such a time as rustc commits to implementing a loser one.

Per my memory of how the Stacked Borrows proposal is worded, this is not problematic because raw pointers "don't matter" except at their point of use. I think any successful memory model for Rust will have to obey that stipulation.

Consider a very simplistic example:

struct S {
    a: i32,
    b: i32,
}

let mut s = S::default();
let s_mut_raw = (&mut s) as *mut S;
let a_mut_ref = &mut s_mut_raw.a;
let b_mut_ref = &mut s_mut_raw.b;

This should look fine. I don't think any memory model could possibly try to argue that this is UB. However, s_mut_raw does have {mutable, raw} provenance over the whole S, "aliasing" [a|b]_mut_ref, which have {mutable, unique} provenance over each field, respectively.

What Stacked Borrow says instead is that a further "use" of a_mut_raw, which is lower on the "borrow stack", makes it so that the references higher on the borrow stack are no longer valid. Any further use of them is UB. (And since these are safe references, the compiler is (probably) allowed to insert however many spurious uses it wants wherever the reference exists.)

The key point here is that while references assert their access parents purely by existing, any raw pointers only matter when they are used. Stacked Borrows doesn't encode anything special for references, actually; they follow the same rule of only impacting the borrow stack when used. The difference is that a lot more things count as a use for references, potentially up to the phantom drop at end of (NLL) scope or even just existing, as well as allowed inserted spurious uses.

(I don't recall the exact terminology Stacked Borrows uses, but IIRC it tracks three kinds of "borrows" of any memory location: Shr, Uniq, and Raw. Shr and Uniq are & and &mut's borrow provenance, and Raw is the special provenance that raw pointers use.)


To be perfectly clear: just because miri's Stacked Borrows implementation says something is or isn't UB doesn't mean that it definitely is UB and will be forever. Stacked Borrows is only a (very strong) proposal at the current time. However, miri reporting UB does mean that something is almost certainly UB in Rust as implemented today.

If we end up adopting a weaker memory model than Stacked Borrows, then it is theoretically possible that some things it doesn't allow and miri reports as UB become defined in the future. This is quite unlikely, however, and if Stacked Borrows doesn't accept it, today's compiler almost certainly makes it UB to execute.

3 Likes

Allow me to say that this is extremely disappointing. There, I said it. This is a common idiom in C that is explicitly allowed by the standard, which otherwise is very picky about casting pointers left and right. Not only is it allowed, it's often used for simulating an inheritance-like abstraction pattern, which, given the limited abilities of the C type system, comes very handy at times.

It would be sad if Rust continued to mark this as UB since it would potentially make interop with a ton of C libraries impossible or hard, and naive wrapper implementations would (continue to) have insta-UB.

I'm on my phone right now so I can't provide an exact reference but I think a prominent example of the pattern is Glib.

1 Like

Stacked Borrows has been mentioned a few times; if you want to know more, here's a document with the current status, and here's a recording of a talk I gave on the subject. As @CAD97 pointed out, this is just an experiment, but it is also the most precise approximation of Rust's aliasing and provenance rules that we have so far, so it could help answer some of your questions. If you have further questions, please ask. :slight_smile:

I haven't read the entire thread here, but a few comments stood out so I'd like to respond:

I think you will be very interested in the document I linked above; you shouldn't have to reverse engineer the rules from what Miri is doing. :slight_smile:

No, it is not. The only action that truly loses provenance is casting to an integer and back.

I am not entirely happy with the situation either. It's not like I defined Stacked Borrows with the intent of disallowing this pattern; I defined Stacked Borrows with the goal of having a concrete model to talk about so that we could evaluate its pros and cons. This is definitely a con. As such, we already have an open issue tracking the problem.

Notice that the comparison with C is not entirely accurate -- in C, every pointer is a raw pointer. If you faithfully translate that to Rust (which requires some nightly features), using raw pointers everywhere you arrive at this and then the pattern is allowed by Stacked Borrows (as confirmed by Miri).
In particular, what you said about C interop is wrong -- FFI with C code that does this is still legal as long as the Rust side uses raw pointers. So, for example, it is not legal to have Rust code create a reference to a field and then pass that as a raw ptr to C and let C turn that into a ptr to the entire struct. But it is legal to have Rust create a reference to the entire struct, pass that to C, and then have C do the get-ptr-to-field and then later recover-full-ptr-from-field-ptr parts. All C operations are raw ptr operations, so that basically corresponds to the code I linked above.

And the other point is, while I agree that supporting this pattern is desirable, the question is how much we are willing to sacrifice to support it. Assuming that supporting this pattern is fundamentally incompatible with our alias assumptions, and we would have to stop using noalias and could not do any reference-based optimizations ever anywhere in Rust -- is that a price worth paying? My personal thinking is, no it is not. So, the question is not "do we want to support this pattern", the question is "which price are we willing to pay in order to support this pattern". This is a decision we as a community will have to make -- but we should first gather some more data; after all, thus far we only have two data points (Stacked Borrows, and the alternative of not making any alias assumptions).
It is possible that there is a "cheaper" way to allow this, something that still permits many optimizations, but the only way to find out is for someone to define a variant of Stacked Borrows (or an entirely different approach) that both accepts such code, and permits some strong optimizations. I am optimistic that we can fix or at least improve upon some of the deficiencies of Stacked Borrows (like this one); this one here however is particularly nasty.

We should probably continue this discussion elsewhere though.

7 Likes

I meant that any "extra" provenance is lost, e.g. the reason why my example in the first post fails miri is that the extra provenance of the remaining fields of the struct was lost on conversion to reference. Is this correctly understood?

As a side note, it's funny that last night I was suggesting something related to a colleague. He was trying to perform type punning using unions, which is disallowed in C++ but allowed in C, so I told him to pull the conversion code into a separate file, compile it as C, and just use the function from C++.

Agreed.

1 Like

Ah... that's not how I think of the term "provenance", hence the confusion. "provenance being lost" to me sounds like "the information about where the pointer comes from is lost, it does not matter any more", and that is definitely not the case with references -- provenance is precisely tracked for them as it does matter where they come from.

But the observation is accurate insofar as when you create a reference, the resulting pointer is only permitted to access the memory it references, with the size determined by the type of the reference. So there's something "lost", and it is the ability for this pointer (or any of its descendants) to access any other memory. But provenance is preserved; in fact, provenance is used to enforce this rule -- without provenance, we couldn't recognize this pointer, so how would we even forbid it from doing anything?

What term would you use for the region of memory a pointer or reference is allowed to access?

Hm, I don't think I have a term for that.

But it's certainly not "provenance". "provenance" is a thing that is attached to the pointer value and gets carried around with it to be able to distinguish different pointers that point to the same memory address -- in Stacked Borrows, provenance is the "tag". That thing can then be used to define the region of memory it is allowed to access, but there is an indirection going on here. For example, with Stacked Borrows, doing something with another pointer can reduce the region of memory that your pointer can access -- using a mutable reference that is a "parent" of pointer p invalidates p.

7 Likes

Hey @RalfJung, I did a box -> pointer -> box dance and stopped miri from complaining about the original issue:

fn main() {
    let full = Box::new(Full {
        a: A { field: 10 },
        b: B { field: 20 },
    });
    
    let full = Box::into_raw(full);
    
    let a: &A = unsafe { &(&*full).a };
    println!("{:?}", convert(a));
    
    unsafe {
        drop(Box::from_raw(full));
    }
}

playground

What's going on here? Without dancing into raw pointers, I get an error when dropping the box.

@alice what you are observing is that for raw pointers, Miri currently does not do fine-grained tracking, so it "confuses" the ptr returned by convert with the one in full.

But long-term, we want to at least experiment with doing that tracking more carefully -- we have to, if we want to faithfully model LLVM's aliasing rules.

So, this is an example of "it works with current Stacked Borrows, but that is just an experiment, and this might well not work with the final model". This particular instance is tracked here.

6 Likes

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.