Testing my knowledge of Pin

Yesterday I got some helpful explanations of Pin and I spent time studying it. I'm pretty sure I now know all that I need to know at a basic level so I would just like to know if the following understanding is correct.


Basic

Pin is just an api formalization for semi-safe usage of structs objects that have very unsafe ways to use them. Namely self-referential structs but also other types of data structures like an intrusive linked list with ways to unsafely drop nodes. All the issues are related to moving data that shouldn't be moved in certain situations.

[note to my future self] Typically this movement is heap related because heap data can only be referred to with pointers and when things gets moved on the heap those pointers dangle. Even growing objects on the heap can move them if space is unavailable. Stack variables are updated in place so much less of a problem, typically. Platform specifics non-withstanding. The fields in a struct that are not heap allocated and the struct is on the stack, when it moves, meaning changes address, so do the stack variable fields with it.

API

Okay so the people implementing these unsafe-to-move data structures have to expose a carefully crafted api that still allows mutation, but doesn't dangle the pointer. How this is done is up to the implementers and has nothing to do with Pin. Where Pin comes in is it allows these implementers to write an api with Pin<..> as the signature. Labeling the functions safe to use through the Pin api itself.

fn init_me(_: Pin<&mut T>)
fn mutate_me(_: Pin<&mut T>)
// etc. 

What's nice about this as opposed to just making the implementation of fn init_me(_: &mut T) api be safe to use in the same way as the Pin version, is that you get some extra guarantees by forcing your users to rely on Pin. Such as Pin not being able to deref (safely) *pinned_var. So they can't thwart your api. They have to go through it.

Pin Projection

Other convenience is pin projection. Typically these unsafe data structures are behind smart pointers like Box. So Pin is almost always a compound type. In the case of Pin<Box<T>>, Pin never exposes &T or &mut T. You only get access to Pin<&T> and Pin<&mut T>.

Unpin

Sometimes you will need to work with either or

  1. dangerous to use objects
  2. safe objects
    with respect to the move issues I mention at the start of this post

But both these objects will be implementing some common trait. This poses a problem. If the dangerous objects only expose their api using Pin<..>, we would have to make the trait methods use Pin<..>. We need trait methods like fn poll(_: Pin<&mut Self>) even though the safe objects don't care for it. So how do we not require the safe implementer to be created as a Pin<Box<T>>, which would require a wasteful Box::pin(T)?

The Box is the problem there, it is not needed for the safe object. That's where Unpin comes in. We would still need to know if a particular object is "safe" or not and when we do know, we can mark it as implementing Unpin (all objects already do impl it and only ones with PhantomPinned field don't).

We just need to be explicit about marking Unpin on the trait bound of the safe object in question and if we do that, and the object is indeed Unpin, we can store the object without Pin, and when needed, we just call Pin::new(&mut obj) and get access to those trait methods I described but without having to box. This is a "zero-cost" operation and only exists for the compiler to verify as the compiler.

Pin having certain deref constraints on it's type has some implications. If we have a struct that contains at least one field with a type that doesn't impl Unpin, that makes our struct also !Unpin. Meaning that in a method call context where self is self: Pin<&mut Self>, we wont be able to derefmut (borrow data in an exclusive dereference &mut self.field1) any of our fields whether they themselves are Unpin or not. When in this situation, we have no choice but to access any and all fields with Pin::map_unchecked_mut.

Unsafe

I said that objects implementing a common trait, with regard to the issue of pinning, can come in two flavors. Safe to use/move regularly and unsafe ones that require being used carefully through their Pin exposed api. But there is also a third option. One typically unsafe to use except for certain scenarios. See the problem here is overlapping api safety with common trait implementers. Given the following api

trait Trait {
    fn poll1(self: Pin<&mut Self>);
    fn poll2(self: Pin<&mut Self>);
}

There are three families of implementers

  1. Totally safe
  2. Unsafe with all methods
  3. Unsafe with only poll1() method.

Given these overlapping api safety concerns, there is no way to satisfy everyone except to make all the methods demand that they are used with Pin. The Totally safe objects don't have to worry about any of this because they impl Unpin and type machinery at compile time that I mentioned earlier kicks in. Inconvenient to have to + Unpin and Pin::new(..) every time, but it's a small price.

The real issue arises with implementers 2 and 3. The third implementer would really like to use poll1() without having to be boxed. It's just not necessary because the unsafety of moving does not apply to them for that particular method poll1(). But 2 can't claim to be Unpin, because they are unsafe sometimes. In these situations we just have to trust the users of these objects to either err on the side caution and Pin<Box<T>> all unsafe !Unpin objects both 1 and 2. Or if the user is really confident that they hold the third implementer, then they can call the unsafe api for Pin such as new_unchecked, map_unchecked_mut and get_unchecked_mut on this third semi-safe implementer. What happens then happens.


I know that was really long-winded. Some technical details I mentioned might be inaccurate. But the point was for me to test if I understand the essence and purpose of Pin and if I am incorrect, be challenged on that understanding. Pin is just a formalization of safe api usage patterns for a specific type of problem. There are two types of users of Pin. The ones who implement the unsafe data structures and want to enforce the api on users. And the users of those data structures who have to go through Pin.

I haven't read your full post yet (but happy to read it later). However, I would probably disagree that moving is usually a heap phenomenon. Consider the following example:

fn main() {
    let mut a = "Alice".to_string();
    let mut b = "Bob".to_string();
    println!("*a: str         = {a}");
    println!("addr(a: String) = {:?} (stack)", &a as *const _);
    println!("addr(*a: str  ) = {:?} (heap)", &*a as *const _);
    println!("*b: str         = {b}");
    println!("addr(b: String) = {:?} (stack)", &b as *const _);
    println!("addr(*b: str  ) = {:?} (heap)", &*b as *const _);
    println!("Swappppp!");
    std::mem::swap(&mut a, &mut b);
    println!("*b: str         = {b}");
    println!("addr(b: String) = {:?} (stack)", &b as *const _);
    println!("addr(*b: str  ) = {:?} (heap)", &*b as *const _);
    println!("*a: str         = {a}");
    println!("addr(a: String) = {:?} (stack)", &a as *const _);
    println!("addr(*a: str  ) = {:?} (heap)", &*a as *const _);
}

(Playground)

Output:

*a: str         = Alice
addr(a: String) = 0x7ffea37c5f78 (stack)
addr(*a: str  ) = 0x5572d9e169d0 (heap)
*b: str         = Bob
addr(b: String) = 0x7ffea37c5f90 (stack)
addr(*b: str  ) = 0x5572d9e169f0 (heap)
Swappppp!
*b: str         = Alice
addr(b: String) = 0x7ffea37c5f90 (stack)
addr(*b: str  ) = 0x5572d9e169d0 (heap)
*a: str         = Bob
addr(a: String) = 0x7ffea37c5f78 (stack)
addr(*a: str  ) = 0x5572d9e169f0 (heap)

Here "Alice"'s heap address stays constant (0x5572d9e169d0). Instead, the "non-allocated" part of the String got moved (which holds the pointer to the heap). So what's swapped here is really memory on the stack.

4 Likes

You're right I get tripped up by variable shadowing and start to think otherwise

    let a = 1;
    let a_ref = &a;
    let a = 2; // I guess only shadows
    dbg!(a_ref); // 1

So the a at line 1 is still technically there but now a new variable at line 3 has taken it's name.

Yes.

You can also try this:

    println!("Swappppp?");
    // Carefully compare output of either of these two variants
    //(a, b) = (b, a);
    //let (a, b) = (b, a);
}

(Playground)

Note that when you write let (a, b) = (b, a); it's not really guaranteed if the memory on the stack will be the same or some entirely different addresses (at least that's what I believe).

1 Like

No, &T absolutely is exposed. Just &mut T isn't, assuming T: !Unpin.

The role of Pin<&T> is somewhat niche, it does not limit &T access, but its existence still promises that nobody gets any (unpermitted) &mut T access to the same place in any other way.

2 Likes

Some more comments. Disclaimer: I'm not soooo familiar with Pin/Unpin either, so take my remarks with caution.

I think pin projection always refers to going from Pin<&mut Struct> to either Pin<&mut SomeField> or &mut SomeField (but not both), where SomeField is a field of Struct.

To my knowledge, this isn't about going from Pin<Box<T>> to Pin<&mut T>.

I think Unpin is not about avoiding heap allocations where unnecessary. Pinning is also possible on the stack, see tokio::pin!, for example. The problem is: You might want to be able to obtain a &mut to some value later for other reasons (I believe). And if we didn't have Unpin we would have to give up that possibility (in safe Rust) even in cases where it was not necessary.

3 Likes

I actually have a question that I can't resolve in my head. So there's a crate called async_trait.

Basically lets you

impl MyAsyncTrait for MyStruct {
    fn make_fut(&mut self) -> Pin<Box<dyn Future>> {
         async fn(_self: &mut MyStruct) -> () {
              // ....
         }

        Box::pin(make_fut(self))
    }
}

But if I box MyStruct then call make_fut to return an Future which contains an exclusive reference to MyStruct.. then if the returned future later runs and mutates _self to the point it gets heap reallocated, wont that cause a dangling pointer?

I think you missed a lifetime here:

fn make_fut<'async>(&'async mut self) -> Pin<Box<dyn Future + 'async>> {

See this example from the docs for the full signature.

Thank you for going through my post and correcting some things. I'll have some time later today to make amendments to the post.

I think you missed a lifetime here:

I meant that in this segment I could

impl MyAsyncTrait for MyStruct {
    fn make_fut(&mut self) -> Pin<Box<dyn Future>> {
         async fn _make_fut(_self: &mut MyStruct) -> () {
              _self.field1 = [i128; 10000000];
         }

        Box::pin(make_fut(self))
    }
}

fn main() {
     let mut s: Box<dyn MyAsyncTrait> = Box::new(Struct);
     let mut f = s.next();
     f.await;

The self access patterns for make_fut and _make_fut are &mut Self and &mut Struct. Not &mut self: Box<Self> or anything indicating self is being accessed behind a Box pointer. Since the Box is what allows mutation of it's inner value to be safely done on the heap and not end up with a dangling pointer when you mutate and trigger reallocation from the box allocater. I'm confused if _self.field1 = [i128; 1000000]; is accessing a box pointer or a derefed self?

If you have a Box<Value>, then you can't cause the Value to be reallocated just using a &mut Value reference obtained from that Box, because a reference always has a fixed size. Of course, you could have something like a &mut Box<Value>, or a &mut Vec<Value>, which can be reallocated, but that modifies the pointer behind the reference, so nothing is left dangling.

If you have a Box<Value>, then you can't cause the Value to be reallocated just using a &mut Value reference obtained from that Box, because a reference always has a fixed size.

Not the pointer itself.The address that the pointer was pointing to. I'm saying doesn't the data get moved out of there? As in the data no longer fits in that location because of mutation and subsequent reallocation.

    let mut s = Box::new(Struct { field1: vec![] });
    let s_mut: &mut Struct = s.as_mut(); // We dont have a &mut Box<Struct> but a &mut Struct
    s_mut.field1.extend(Vec::from_iter(0..10000000)); // cause reallocation
    // s_mut should be dangling here???

s.as_mut() gives us a non box/unmanaged pointer to Struct?

It will just access the target of the original outer &mut self method argument. The future would capture the mutable reference and thus the returned boxed future contains (and keeps alive, i. e. the borrow checker will enforce this) the &mut self borrow all the way until the future is run to completion and/or dropped.

The heap allocation for the Pin<Box<dyn Future...>> would only contain the mutable reference to MyStruct, it would never be given ownership of the MyStruct itself.

Since this lifetime relation between &mut self and the returned boxed dyn Future is not expressed in the type signature in your code, you will get a compilation error unless you change it to something like @jbe wrote above.

1 Like

I think I have a hole in my knowledge of how references relate to pointers. When I think &mut T where T is not a smart pointer like Box, just a regular struct. Then

&mut T ------> 0x000ffe7823
pointer        address

So when I have a Box<T> give me back a &mut T, I'm thinking it gives me a copy of the pointer to heap data (the actual T) it manages. So if I operate on &mut T without going through the box methods, how can I mutate the data on the heap if the box isn't there to cause a reallocation if it's needed and updating the pointer to the new location it gets back from an alloc call or something.

Granted for variables who's size is statically known, like a fixed array, there wont be a need for a reallocation. But what if the s_mut.field1 is a Vec that does grow and isn't fixed size. If the Vec "moves" itself to a new heap location, does the struct it's contained in have to move as well?

Okay having thought about it I realize where I'm wrong. If the struct contains types of statically known size, then there is no problem.

If it contains fields that are heap growable, then there is still no problem because it doesn't actually contain them in the sense the struct is not actually located where their that heap data is located. Even though the struct is also on the heap.

field1: Box<u8>:
s.field1.as_mut() returns a mut reference to u8. I can mutate it. But who cares it wont move

field1: Vec<u8>
s.field1.as_mut() returns a mut reference to Vec<u8>. I can mutate it. but I'm only mutating it through another pointer, which is technically Vec. So really what's happening is

&mut self ------> &mut Vec --------> [0x000ffe7823, ........]
pointer           another pointer!   heap address

Somehow missed the fact that any field on a struct that is heap allocated is going to be a field containing a smart pointer like box or vec in safe rust. And that even reference/pointer to any one of these fields is just referencing yet another pointers. Situation would be different if the field was a raw pointer.

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.