Aliasing rules and mutable pointer dereference

Continuing the discussion from How unsafe is allowed in unsafe code?:

In the other thread, I had a hypothesis, and I'm not sure if it's true or not.

I got the following reply:

But I thought:

So is the above example UB? And what if I modify it like this:

fn foo(r: &mut i32, n: usize) {
    unsafe { *(n as *mut i32) += 1 };
    *r *= 2;
}

fn main() {
    let mut x: i32 = 1;
    let r: &mut i32 = &mut x;
    let n: usize = r as *mut i32 as usize;
    foo(r, n);
    println!("{x}");
}

(Playground)

How can the compiler know that *r in foo is modified in the unsafe block? Due to the aliasing rules, couldn't the compiler (in theory) re-write the function as follows:

fn foo(r: &mut i32, n: usize) {
    let t: i32 = *r * 2;
    unsafe { *(n as *mut i32) += 1 };
    *r = t;
}

That is because the compiler knows that nobody accesses *r other than through the mutable reference, right? That's what aliasing rules are about, isn't it? (I think that is similar to the restrict keyword in C.)

P.S.: After I wrote this post, I found this: Miri not erroring on aliased mutable pointer: sound or UB? I suppose this is similar?

The whole point of the stacked borrows & tree borrows models is to make it possible to reason about when raw pointers have permissions and when they don't. I agree with @geeklint that the first example is likely not UB. The mut reference is essentially inactive because you created a mut raw pointer from it, just like if you reborrowed a new mut reference from the old one. If you made a write through the reference after creating the raw pointer, the raw pointer would be invalidated, and further access through it would be UB

2 Likes

What exactly is that? (In simple terms, or what's the best reference to read about this?) I would like to understand how I can deduce that the first example is not UB.

How about the second example though?

And what if we construct some intermediate example:

fn main() {
    let mut x: i32 = 1;
    let r: &mut i32 = &mut x;
    let n: usize = r as *mut i32 as usize;
    {
        unsafe { *(n as *mut i32) += 1 };
        *r *= 2;
    }
    println!("{x}");
}

(Playground)

Here, I just moved foo's body into a block in main.

So are all three okay? Or is one of them UB? And if yes, are both examples UB which work with usize, or only the one where I do it in a function foo?

Stacked Borrows and Tree Borrows are formal sets of rules about how raw pointers and references interact. Neither have been officially adopted by Rust yet though. This blog post on Stacked Borrows 2 is probably the best place to start. Tree Borrows is an new contender which fixes some of the problems Stacked Borrows has, but the concepts are quite similar over all.

I think the second example is likely UB because there are some extra guarantees about references passed to functions [1]. Miri agrees that this one is UB, though it's worth noting that casting pointers to integers degrades miri's ability to catch UB and should generally be avoided when possible.

Where you inlined the function I believe there is no UB since the function protectors don't come into play. Miri didn't report any UB, it just printed a warning about the pointer to integer cast. I removed the integer cast and it still didn't find UB.


  1. they sorta kinda get reborrowed, which would invalidate the pointer I think? ↩︎

3 Likes

Thanks, I will see if I can understand some of it.

Regarding usize: I deliberately used that to simulate some sort of opaque pointer arithmetic which the compiler can't reason about.

I agree that the first example should not be UB since p effectively acts as a reborrowed &mut from r. The second example should however be UB due to the implicit "use" of the mutable reference when passing it to the function (it's also really sketchy due to the ptr-to-int-to-ptr roundtrip, pls avoid that in real code).

I always thought that the compiler doesn't reason about what's done with raw pointers (opposed to references). But now I feel like that must be wrong.

If it had compiled with a regular mutable reference instead of a mutable raw pointer, then it's not UB. Your example is equivalent to a mutable reborrow, which compiles and behaves correctly. Hence, the raw pointer version must be sound.

Of course, unsafe code that can't be re-written safely can also be sound, so equivalence with compilable safe code is a sufficient but not necessary condition of soundness, yet it's a good crutch.

That can't possibly be the case. If this were true, then provenance rules would be impossible to uphold in the presence of raw pointers. (Thinking about it, they would even be meaningless to define in the first place).

For example, <[_]>::Iter only stores raw pointers, there is no actual slice within the iterator. Yet the provenance rules require that the elements of a slice must only be accessed through a pointer to the whole slice as opposed to a pointer to just the first element (even though their addresses are numerically equal).

4 Likes

Well, I would suggest anyone who really wants to learn deeper about the unsafe Rust read Ralf's Ramblings through. The case in this post is about stack/tree borrows. If you don't want to spend time reading, then trust Miri instead.

fn main() {
    let mut x: i32 = 1;
    let r: *mut i32 = &mut x; // borrow stack: [r]
    let p: *mut i32 = r; // borrow stack: [r, p]
    unsafe {
        *p += 1; // borrow stack: [r, p]
        *r += 1; // borrow stack: [r]
    } // borrow stack: []
    println!("{x}");
}

I will look through Ralf's Ramblings and I also found this paper: Stacked Borrows: An Aliasing Model for Rust by Ralf Jung et. al.

I assume this isn't yet "standardized" as in "stable" yet but proposals to describe the way how Rust currently works like or should work like?

I.e. can I really be sure that my very first example is sound? Is this only due to non-documented behavior of the compiler, which I might be able to rely on though due to some sort of "de-facto stability" and Rust's stability gurantees?

Stacked Borrows is not the official formal model (yet), but it has no serious competitors.

I'm pretty sure you can be confident about safe-equivalent unsafe code being sound.

I would consider Tree Borrows a serious competitor, especially considering it is experimentally available in MIRI

6 Likes

Stacked Borrows is generally considered to be at least as conservative as what the final memory model will be, so if your code passes Stacked Borrows there's a good chance it will be considered sound under whatever the final model ends up being.

It's certainly possible that some more esoteric unsafe code could go from sound now to unsound in the final model, but as has been discussed already in this thread if you're using a raw pointer in a way that would pass the borrow checker if it was an &mut there's really no way for any memory model to consider that unsound.

4 Likes

I have a few questions (yet). First of all, I have to say that I find this very very interesting, and I wonder why I didn't stumble earlier about these issues in the documentation (maybe I didn't read it thoroughly or maybe it's under-documented yet).

Am I right with the following statements?

  • Pointers in Rust are more than integers. That is, they are (e.g. during code analysis) tagged with some value that ties them to the allocation they work on? (Pointer Provenance) This tag doesn't exist in the compiled binary, but can aid the compiler to perform certain optimizations.
  • If I do a pointer→integer→pointer round-trip conversion, then the pointer provenance may be lost.
    • Question: Can this lead to
      • UB, where it wouldn't UB otherwise?
      • less optimized binary code?
      • both of the previous?
      • or: is it not defined/clear yet what would/will happen?[1]
  • It is not always UB if I have several &mut references with overlapping lifetimes to the same location because
    • it may be a reborrow,
    • the current draft and/or the current ideas for Rust's memory model in regard to stacked borrows (see link above) allow me to mutate data even if there is some other mutable reference to the same data. (not sure on this one, though)
  • It is not UB per-se if I have a &mut reference, create a pointer from that reference, mutate the pointed-to data through the pointer, then use the mutable reference afterwards.

In addition to trying to understand all these issues, I'm additionally interested in whether there is normative information on any of this.


To give an example:

fn main() {
    let mut x: i32 = 1;
    let r: &mut i32 = &mut x;
    let p: *mut i32 = r;
    let r2: &mut i32 = unsafe { &mut *p };
    *r2 += 1;
    println!("{r}");
}

(Playground)

Do stacked borrows make this sound too?


  1. Cite from the Unsafe Code Guidelines on that matter: "The exact form of provenance in Rust is unclear." ↩︎

Yes, provenance is similar to lifetimes in this regard – it doesn't affect code generation or semantics (of correct code), apart from optimizations making your code emit nasal demons if you don't uphold them.

I guess UB is definitely an option, and so is "I don't know". For example, if you run the following program under MIRI, you'll get the following warning:

warning: integer-to-pointer cast
 --> src/main.rs:5:25
  |
5 |     let ptr_roundtrip = int as *const u64;
  |                         ^^^^^^^^^^^^^^^^^ integer-to-pointer cast
  |
  = help: This program is using integer-to-pointer casts or (equivalently) `ptr::from_exposed_addr`,
  = help: which means that Miri might miss pointer bugs in this program.
  = help: See https://doc.rust-lang.org/nightly/std/ptr/fn.from_exposed_addr.html for more details on that operation.
  = help: To ensure that Miri does not miss bugs in your program, use Strict Provenance APIs (https://doc.rust-lang.org/nightly/std/ptr/index.html#strict-provenance, https://crates.io/crates/sptr) instead.
  = help: You can then pass the `-Zmiri-strict-provenance` flag to Miri, to ensure you are not relying on `from_exposed_addr` semantics.
  = help: Alternatively, the `-Zmiri-permissive-provenance` flag disables this warning.

Again, if this is equivalent to a reborrow, then it's for sure OK. However, it is not the case that any two aliasing &muts are OK. Certainly, the existence two simultaneous &muts are UB at least when they are independent (i.e., neither is the reborrow of the other), as far as I can tell, so the following is not allowed and is explicitly marked as UB by Miri:

let mut x: u64 = 0;
let p1 = unsafe { &mut *(&mut x as *mut _) };
let p2 = unsafe { &mut *(&mut x as *mut _) };
1 Like

Side note: I just figured out that on Playground

MIR
Build and show the resulting MIR, Rust’s control-flow-based intermediate representation.

(which never reported any UB) isn't the same as

Miri

Execute this program in the Miri interpreter to detect certain cases of undefined behavior (like out-of-bounds memory access).

(which does report when it detects UB). I haven't been familiar with these yet.

I have now re-run the intermediate Playground here with Miri, and also see this warning now:

warning: integer-to-pointer cast

So I take that the exact rules in regard to using integers to calculate/create pointer values aren't worked out yet and that it's safe(r) to not rely on that. I also saw a couple of posts on that topic on Ralf Jung's blog, but the amount of information is overwhelming for me yet

So the last playground on the bottom of this post is OK? (Miri doesn't complain either.)


My "take away" is:

  • Having a &mut reference somewhere does not mean that all mutations of the pointed-to data must go through that reference (e.g. due to reborrows or pointer operations that are equivalent to a reborrow).
  • Integer to pointer casts are an ongoing debated issue and rules are not clear yet.

Hope I got it right now.

There are a lot of snippets and links/references in this post, but if you mean this one, then it's OK, because again, its equivalent with safe code that compiles.

1 Like

Yes, that's the one I meant.

There is a great talk about strict provenance in rustconf 2022

2 Likes

Reborrows “go through that reference” by definition. If you are trying to say that it’s ok to conjure up an aliasing &mut, then I disagree in general. However, as I understand it, there are still unanswered questions with regard to mutable aliasing. For instance:

And the followup post contains more detail on this statement: What is a good mental model of borrow checker? - #46 by CAD97

More than that, integers do not have provenance so the conversion throws away information that is necessary to prove all behavior is defined. I don't think this much is controversial. What's still up for debate is whether strict provenance will be adopted (and in what form). And apparently, even the formal definition of a pointer is still unclear:

So, what is a pointer? I don’t know the full answer to this. In fact, this is an open area of research.

Quoted from Pointers Are Complicated, or: What's in a Byte?

Also relevant: The Tower of Weakenings: Memory Models For Everyone - Faultlore from the author of Strict Provenance.

Edit: And I just remembered this absolute gem: make use of LLVM's scoped noalias metadata · Issue #16515 · rust-lang/rust · GitHub (Yeah, I dunno. Trying to reason about this counterexample breaks my brain.)

3 Likes