I would like to understand why below example can be UB?
My initial understanding that we only pin for lifetime 'a, so only need to gurantee pin contract during that lifetime. I do not understand what causes behind may lead to UB after lifetime 'a ends.
Is it because of possible compiler reorder?
use std::mem;
use std::pin::Pin;
fn move_pinned_ref<T>(mut a: T, mut b: T) {
unsafe {
let p: Pin<&mut T> = Pin::new_unchecked(&mut a);
// This should mean the pointee `a` can never move again.
}
mem::swap(&mut a, &mut b); // Potential UB down the road ⚠️
// The address of `a` changed to `b`'s stack slot, so `a` got moved even
// though we have previously pinned it! We have violated the pinning API contract.
}
Concretely, for pinned data you have to maintain the invariant that its memory will not get invalidated or repurposed from the moment it gets pinned until when drop is called.
@cuviper, thanks, to me it's just telling me the rules but I would like to understand the underlying reasons, can you eloborate further why in above example, it could be UB?
I meant why compiler cannot rely on the fact of the pin only for lifetime 'a, and we do not need that gurantee after lifetime 'a ends which means the p has dropped.
Does compiler do some optimization based on the knowledge that T was pinned before, but neglect the fact that it is only pinned for lifetime 'a, which can be conflict with later mem::swap or any mem invalidation operation?
The compiler doesn't rely on Pin, it's the type being pinned that wants to rely on Pin. Such types most of the time are self-referential: they contain some kind of pointer to another element of the type. Most notable examples are the types produced by asyncfns and blocks.
The problem is that if you move an instance of such type the pointed element gets moved, but the pointer still points to the old element! So they need a guarantee they won't be moved in between function calls, drop included.
If you however guarantee pinning only for a lifetime 'a then how do you express the previous requirement?
Consider a function taking a Pin<&mut Self>: the lifetime is only guaranteed to be valid for the current function call, there's no link to the next function calls. But you also can't use other lifetimes because if you do then the reference will be unusable for that lifetime because already borrowed!
Hi, thanks, I understand the self-referential type and async cases, my question is especially for new_unchecked() in above code example from std docs. because new_unchecked() accept both Unpin and !Unpin type, so if divide into two cases
if T is Unpin type
the p has dropped in unsafe block, which my understanding is that I tell compiler I do not need the pin contract any longer
after that, why it could be UB down the road even I can guarantee to correctly use a
if T is !Unpin
then the mem::swap is incorrect usage, which is obvious.
If T: Unpin, then there is no pinning contract, and the only purpose of constructing Pin is to satisfy some other generic code that wants Pin (e.g. the trait method Future::poll, which takes Pin<&mut Self> even if the type only needs &mut Self). So there is no problem and no UB, in that case.
The only disallowed case is moving (swapping or otherwise) something that is !Unpinand was previously pointed to via a Pin. You can always move something that is Unpin or which has never yet been pinned.
If T implements Unpin then it is declaring that it doesn't care about pinning, so any pinning requirement doesn't apply to it. Of course the problem is if T: !Unpin.
What can call_pinned rely on? If call_pinned is called once can it be sure that the next time it is called the struct will still be pinned to the same memory location? If you allow pinning for some lifetime then you can do this:
// Create an instance of `SelfRefStruct`
let mut self_ref_struct = todo!();
{ // 'a start
// Pin for 'a and call `call_pinned`
unsafe { Pin::new_unchecked(&mut self_ref_struct) }.call_pinned();
} // 'a end
// Move to another stack location
let mut new_self_ref_struct = self_ref_struct;
{ // 'b start
// Pin for 'b and call `call_pinned`
unsafe { Pin::new_unchecked(&mut self_ref_struct) }.call_pinned();
} // 'b end
But this is bad: SelfRefStruct wants to be pinned for the same lifetime when the two call_pinned calls occurs. Being pinned for different lifetimes defeats the point of being pinned because in the meantime it could be moved.
This is what I meant by "Consider a function taking a Pin<&mut Self>: the lifetime is only guaranteed to be valid for the current function call, there's no link to the next function calls". call_pinned only knows self is pinned when the function is called, but the moment it returns you can already end the pinned borrow.
For the second part "But you also can't use other lifetimes because if you do then the reference will be unusable for that lifetime because already borrowed!". This isn't really reasonably expressible in code. You could make call_pinned take Pin<&'static mut Self> but that obviously is going to fail. You could add a lifetime parameter to SelfRefStruct but that would have no meaning and lead you to a serie of non-sense errors.
This gets into the tricky split between what we call "library UB" and "language UB". "Language UB" is the obvious case of UB, when you attempt to do some operation that the language declares as UB, such as dereference a null pointer. "Library UB" is a softer concept: you've broken the invariants of some unsafe API, and now all guarantees are out, but at the library level. It might immediately cause "language UB," or it might not, but by causing "library UB" you've given "permission" for that library to arbitrarily cause "language UB" at any point.
This is especially notable when updating library versions. "Language UB" by construction only cares about what code actually gets executed. "Library UB" cares about the documented API requirements, and updating library code cause "library UB" to manifest into "language UB" earlier.
The standard library documentation currently doesn't make much of a distinction between the two. In the future, that's an area we'd like to improve on.
(Disclaimer: I'm on T-opsem, but speaking in general terms only, not on behalf of the team.)
Just for me to understand the concept/terminology of UB better, I have a further question.
Isn't it also the case with "Language UB" that some operations are either
"declared" to cause "undefined behavior" because the compiler may change in the future, or
actually causing unpredictable behavior for real.
So I would see at least three different flavors of operations which cause UB:
operations which really cause unpredictable behavior of the program
operations which are declared on the language level to be UB (to allow changes in the language)
operations which are declared on the library level to be UB (to allow changes in the library)
Or am I mistaken?
From a programmers side, all three should be avoided, I guess, unless you're working on the standard library and can rely on on how the language and/or standard library will change.
These are most often declared as unspecified, not undefined. That is, they will not end in the program being malformed (and therefore does not allow for any possible compilation output) - it's just that the result of this specific operation is explicitly documented as unstable.
I specifically meant operations which may result in UB in the future, even though currently it will not cause crashes and be somewhat predictable (until changes in the future).
I do not see there will be such path exist, do you have example?
My interpertion of UB is "behavior that can not be reasoned about even it is incorrect behavior"
We either have correct behavior or incorrect behavior, and for incorrect behavior, either it is determinstic or undefined.
determinstic incorrect behavior is the incorrect behaviors that we can reason about, undefined behavior are the ones without guarantee, and cannot reason about
I'm thinking of this example, where it's debated whether creating a mutable reference to uninitialized values is considered UB (even though it currently isseems de-facto sound to do (if you don't read from it), but not guaranteed (yet, I guess)).
So Rust could declare that creating a mutable reference to uninitialized values is undefined behavior even if it doesn't cause any problems for now.
P.S.: Maybe the linked example isn't the best example for what I meant, because there it's debated whether the "undefinedness" is lifted (or even has been de-facto lifted), which is more of an unclear state rather than one of the three cases I listed.
I think it is important to point out that UB is not guaranteed to result in unpredictable behavior. You might do something that legitimately triggers language UB, and then by chance the compiler optimizes it in a way that doesn't break anything. There are even types of language UB that predictably behave in this way on today's compiler.
One way this can happen is that, maybe the circumstances for when the compiler takes advantage of something are really complicated, so we just decide that it's never allowed to simplify the rules (and allow subtle changes to those complicated circumstances).
Another way, as @jbe mentions, can be when they have not yet decided what the actual rules should be, so for now the rule is just "don't do it", and then they can relax the rules until they've found a good way to relax them.
When the specification is imprecise, then I would say it's a bad specification, and not UB. Unfortunately Rust is "under-defined" often (e.g. here), which I can understand, considering the rapid development and young age of Rust.
If either the language or the library defines that some operation results in UB, then executing that operation is UB, whether it actually crashes, does nothing, or deletes your harddrive. It doesn't matter what it does. It just could do anything and thus must be avoided.
In case of that example that I linked, the rules regarding whether some behavior is (or should be) defined as exhibiting UB are somewhat transient/debated, and it would be safer to assume that the code in that linked example exhibits UB.