Let's say I've written some code that borrows a shared reference to some data, but I happen to be able to prove that it is the only shared reference to that data. Is it okay to play pointer tricks in order to "upgrade" it to an exclusive reference?
The real use case here is writing an iteration algorithm only once, for shared references, and then implementing iter_mut on top of it. Because iter_mut borrows the container exclusively, I know that the shared references returned by iter are the only ones that currently exist.
Here is a simplified example (playground). Is this UB?
struct S<T>(T);
fn upgrade_ref<'a, T>(s: &'a mut S<T>) {
let t: &T = &s.0; // No problem.
let p: *const T = t; // No problem.
let p: *mut T = p as *mut T; // No problem.
drop(t); // There is no shared reference left (?).
let t_mut: &'a mut T = unsafe { &mut *p };
}
Well okay, now that I write that I see that the compiler has some data flow analysis that is smart enough to tell me off somehow:
error: casting `&T` to `&mut T` is undefined behavior, even if the reference is unused, consider instead using an `UnsafeCell`
--> src/lib.rs:10:37
|
8 | let p: *mut T = p as *mut T; // No problem.
| ----------- casting happened here
9 | drop(t); // There is no shared reference left (?).
10 | let t_mut: &'a mut T = unsafe { &mut *p };
| ^^^^^^^
|
= note: for more information, visit <https://doc.rust-lang.org/book/ch15-05-interior-mutability.html>
= note: `#[deny(invalid_reference_casting)]` on by default
So I guess it's not supposed to be allowed. But then my question is: why? What reasonable optimization is impeded by the assumptions in this function?
And then, assuming it's UB for a good reason, what is the usual pattern for not needing to write an iterator twice for each container? I guess I can write an iterator that produces raw pointers and map those to references of each type; is there a better way?
if it's compile time, then you should already have an &mut T. if it's at runtime, then you must bypass the type system and borrow checker, i.e. using raw pointers instead.
it's UB to do anything that's equivalent to transmutation from a shared reference to exclusive reference.
raw pointers must be used, and only if the raw pointer have provanence that's derived from an exclusive borrow.
this paricular example is ok. the raw pointer has the correct provenance, and the access pattern doesn't violates the rules for shared and exclusive references.
note, drop(t) is not needed, and it's useless. the lifetime analysis is based on last use of the reference, not dropping scopes. besides, shared references are Copy, so dropping it is meaningless.
opps, misread the code snippet. the code as-is is wrong, but if you change it to this, then it is ok:
fn upgrade_ref<'a, T>(s: &'a mut S<T>) {
+ let p: *const T = &mut s.0; // No problem.
let t: &T = &s.0; // No problem.
- let p: *const T = t; // No problem.
let p: *mut T = p as *mut T; // No problem.
drop(t); // There is no shared reference left (?).
let t_mut: &'a mut T = unsafe { &mut *p };
}
Transmuting an & to &mut is always Undefined Behavior.
No you can’t do it.
No you’re not special.
You can try running your code under MIRI, it will claim you have UB. That's because even if you can prove that a shared reference is unique the compiler will still assume that the pointee cannot be mutated through it.
there is a way to handle "this is behind a shared reference but i am confident i have exclusive acces to it at this time" and it is unsafecell/refcell depending on wether you want a runtime check of this assumption.
regarding iterators yeah you need two distinct functions for mutable and imutable iteration, it might be a bit annoying but it also lessens your constraints for immutable iteration so you can make it quite a bit more optimized in a lot of cases
fn bad_ref_use(s: &mut S, new_value: T) {
let t = &s.t;
// inventing a read to s.t is fine here because it's not legal to mutate s.t
// using t and there is no other access to s
// (instantly) UB transmute
let t_mut = unsafe { &mut *(t as *const T as *mut T) };
// not required for UB, but to demonstrate the issue
*t_mut = new_value;
// perfectly valid to use invented read here, returning old value
s.t
}
But as with most optimization this compounds quickly once you start getting multiple procedures etc. involved. This "code motion" optimization - being able to move accesses and computation around in the generated code - is fundamentally important to be able to generate better code, and it depends on being able to quickly classify large branches of the data paths for values as not interacting with each other: exactly what a shared ref provides.
The correct way to do this is to have a &UnsafeCell<T> instead of a &T. You can then convert an &UnsafeCell<T> into a &mut T as long as no other &mut T exists.
Thanks @simonbuchan for the example of an optimization that motivates this. I totally believe it's considered "UB in spirit", although I'm not 100% sure I can see why this is UB based on what's listed in [undefined.alias]. I would love to know if someone can be a little more precise about why specifically this is problematic than just "you should feel bad about even thinking of it" vibes.
Thanks, this is probably the real answer in practice—store the values as UnsafeCell<T> internally, have an internal iterator that provides &UnsafeCell<T>, and map its result to &T for iter and &mut T for iter_mut.
Right. But in my example the &T isn't live once the memory is mutated, or at least the example can be minimally changed to make that so. Further, the posts above claim that the mere existence of the &mut T is UB, not mutating memory with it.
I think it’s critical to note that pointers mostly remember where they came from (as described in the language semantics, as used in compiler optimizations, and as implemented by interpreters like Miri, even if not at runtime in compiled code). Even if you drop the source & reference, in order to continue to use other pointers or references derived from it, the source & permissions must not be violated. I suppose the source permissions must still be “live” in some sense.
Though, I’m not entirely confident about the granularity of pointer permissions/provenance - I think it’s per-byte. That is, you could have a &u16, get a *const u8 to one of its bytes, drop the &u16, write to the other byte, and read from the *const u8 without UB.
Indeed. You can think of creating a & as giving the compiler permission to read bytes from the referent and creating a &mut as giving the compiler permission to read or write bytes from the referent (up to the length of the referent), as long as it does not change the behavior of your code (unless the code already had UB).
The compiler does actually insert “fake reads” and “fake writes” of & or &mut references in some places, such as the start of functions. Theoretically, it could introduce them in more places (again, so long as it does not change the behavior of a UB-free program).
AFAIK it’s still an open question as to whether the compiler is allowed to assume that the referent of a &T or &mut T is always a valid T, but the opsem team is leaning towards “no”: reads (and possibly writes) of the referent are permitted, but the compiler cannot assume that whatever bytes it reads always form a T. (In other words, if the opsem team affirms this, you could cast a &mut [MaybeUninit<u8>] to &mut [u8], and if whoever uses the &mut [u8] only writes to the slice and never reads from it, it could be sound.)
Transmuting an & to &mut is Undefined Behavior. While certain usages may appear safe, note that the Rust optimizer is free to assume that a shared reference won’t change through its lifetime and thus such transmutation will run afoul of those assumptions. So:
Transmuting an & to &mut is always Undefined Behavior.
No you can’t do it.
No you’re not special.
Well, Rust's UB is just the set of things rust defines as UB, if you really want to get pedantic and reductionist about it.
Semantically it's probably best to think of it in the context of "a programming language is a definition of the interface between a programmer and a compiler" - for example the language defines what an "if statement" is in terms of the syntax and semantics: it is something the compiler must implement in order to be called a Rust compiler, and something the programmer can type in a Rust source file to get the described behavior.
In this context, Undefined Behavior is a declaration that a Rust source file does not contain this - if you type statements that would transmute a ref to a mut ref, then you are not meeting the contract required of you by the language, just as much as a compiler failing to produce the semantics of an if statement would be. In other words, the input is no longer Rust source code.
Most of the time you don't need to worry about this in safe languages: the language requires the compiler to reject any input that does not meet the syntactic or semantic input specified (with the strange effect that this is not rust source code is in fact a Rust source file: just an invalid one), but for unsafe the burden is shifted onto you, and you can produce files which a compiler is not required to (and generally won't) reject, but are not valid Rust code.
So, yeah, it's pretty much "you should feel bad", because you're letting down the compiler
And the best solution in terms of soundness is still to implement iter_mut separately.
If your iter somehow (due to a bug?) goes twice over the same element, that's safe by itself, but the upgrade UnsafeCell->&mut T will give two mutable references which are UB.
So you need stricter safety justifications on any unsafe in iter_mut; better if there is not any unsafe code at all.
Maybe it helps to think of the “vibes” this way: &T suggests something like a nice, boring, Rusty pointer to stable data, almost like data in ROM, or at least memory that is not going to change.
So when the compiler sees something like let v = *some_ref, it is free to lean into those vibes and think: “Alright, this is just a read from stable memory. I might do that read later, or do it more often than the source code seems to suggest, especially if I am under the usual constant pressure from my tragically limited CPU registers.”
It is disallowed on principle, to allow the compiler to make simplified and generalised assumptions about it, that hold all the time everywhere, without taking on complexity of checking these assumptions on case-by-case basis.
There will be code in the compiler that performs some analysis or transformation that is only valid if the data is frozen, and do it blindly and unconditionally on &-without-UnsafeCell.
You're thinking about your code in this specific case and for you this is obvious from what you're looking at. But the compiler is written to compile all code from anywhere written by anyone, and has to be correct for everything from trivial cases to massively tricky jungles of references in huge programs.
Implementing and optimizing for theoretical simplifications is easier and less computationally expensive than compiling with fewer broad guarantees and with need to re-check the actual context to precisely track where something immutable actually is mutable.
Please read the Rustonomicon before writing unsafe code.
a few paragraphs above ...
... since I also think your example is not covered by
&T must point to memory that is not mutated while they are live
nor
&mut T must point to memory [...] that no other reference points to while they are live.
as there are no live references pointing to *p at the point where t_mut is created. (Well, there is s. Is that a problem? And now that I think of it ... how does this latter part not cover reborrows?)