How the pin SmartPoiner fixed data?

I know that the purpose of Rust's Pin is to solve the issue of self-referential structs. It is used to prevent certain data from being moved within the data itself. When I looked at the source code of Pin, it essentially seems like a pointer. And I came up with a solution to address the immovability of data, which is to allocate the data on the heap and use a pointer to reference it. Then, when moving, we can simply copy the pointer. This way, the data remains fixed on the heap while the pointer's position keeps changing, but its value, which points to the data on the heap, remains constant.

Even if the data is not allocated on the heap but in the stack, the same approach can be applied by using a pointer to the starting position of the data in the stack. And since the deallocation of the stack is uncertain, we can use unsafe operations when manipulating the pointer. Although the pointer may point to empty data due to stack deallocation, it is still within the realm of unsafe logic. When using the pointer, we can ensure that the stack data is valid. If we cannot guarantee it, there's no solution, as you are already performing unsafe operations.

Furthermore, the pointers I mentioned above can be implemented using some smart pointers.

However, despite searching online, I haven't found a clear explanation of how Pin solves the immovability issue and what kind of data is being moved. Additionally, I noticed that some uses of Pin involve nesting another layer of pointers, which I feel is unnecessary. For example,

Pin<Box<dyn Future<Output = T> + Send + 'static>>

Pin itself is already a pointer, and it is then nested inside a Box pointer. Can't we remove Pin? It can be transformed into

Box<dyn Future<Output = T> + Send + 'static>.

In this way, the Box pointer points to the future data, and when moving, we can simply copy the Box's data. This way, the data of the future is not being moved either. It's just like a pointer, pointing to the future. We just need to copy the pointer, and the data's position within the pointer remains unchanged.

So, is there something wrong with what I mentioned? How does Pin actually solve the issue of immovable data?

1 Like

When I looked at the source code of Pin, it essentially seems like a pointer.
… Pin itself is already a pointer,

This is critical to understand: No, a Pin is not a pointer. Pin is a type that (when used as intended) contains a pointer. When you write Pin<Box<T>>, the Box is the pointer that Pin is controlling. There is only one pointer.

And I came up with a solution to address the immovability of data, which is to allocate the data on the heap and use a pointer to reference it. Then, when moving, we can simply copy the pointer.

This is exactly what Pin<Box<T>> does. Pin is a formalization of this idea that, if we point to something, we can refrain from moving it and only move the pointer.

Even if the data is not allocated on the heap but in the stack, the same approach can be applied by using a pointer to the starting position of the data in the stack.

This is what the pin! macro does: it creates a pointer to a stack variable, and provides it to you as Pin<&mut T>.

This is how Pin<Box<T>> already works. The key thing that Pin<Box<T>> does that Box<T> does not is: prohibit moving out of the Box. In general, Pin takes an existing pointer type and provides the additional guarantee that the pinned value (the value the pointer points to) won't be moved.


The value of having Pin rather than just a PinnedBox type is that you can produce Pin<&mut ...> pointers to parts of the pinned value ("pin projection"). This allows a complex pinned type (like a future), composed out of other pinning-needing types, to address those parts of itself without needing more than one pinned allocation.

13 Likes

This is the clearest description of Pin I've read and the questions that Rgoogle asked were really insightful. Thank you, both of you!

1 Like

So, how does Pin ensure that the pinned data is not moved?

It is not just about move or copy. When we use * to dereference an Arc or Rc pointer, or any other smart pointer, there are two possibilities. If the data is not copyable, dereferencing the pointer may fail or result in moving ownership (which should result in an error). If the data is copyable, * dereference will create a copy of the data.

Therefore, Pin must handle the deref and deref_mut functions differently. However, I looked at the implementation of Pin for these two functions. Their purpose is to dereference the pointer and add &, then wrap it in another Pin.

The result is Pin, where data is the content wrapped in Box. If we continue to dereference it *, it depends on whether the data has implemented deref or deref_mut. If it has, then adding * will give us the same result as above - dereferencing the wrapped data and wrapping it in another Pin.

So, is this how Pin ensures that the data is not moved? It relies on deref and deref_mut to never obtain a reference to the data, but only the inner data wrapped in Pin. It's like dereferencing Pin and getting the same type of Pin, except that the data wrapped inside Pin is different - it is the result of dereferencing.

So, how does Pin ensure that the data is not moved?

It's really confusing.

In order for data not be be movable you need two factors: The data needs to be behind a pinned reference which limits the API as discussed above, but crucially also the data itself must be of a type that is defined as “the type’s author thinks it can be relevant for the type not to be movable”, which is expressed via the Unpin trait. Specifically, only data types T that don’t implement Unpin can be actually prevented from being moved in the first place.

Unpin is an auto-trait, so it’s automatically implemented for most types; the way to opt out is to include a field of type PhantomPinned (which incidentally is probably one of the only types that is !Unpin but Copy). Making use of the pinning guarantees does however then also involve usage of unsafe from the type’s author, typically.[1]

Such types will most typically also not implement Copy (in fact I believe that’s essentially always the case) because, as you correctly described, otherwise you could simply copy (and thus sort-of “move”) pinned value by dereferencing.

Otherwise, data behind a reference in Rust can only be moved by helper APIs, which include mem::replace for example. Such APIs can move data from behind mutable references, which is why Pin<PointerTo<T>’s main job is to prevent any API that can obtain a &mut T reference to the target. Types like Pin<Box<T>> or Pin<&mut T> do thus not implement DerefMut (except for target types T: Unpin which cannot “really” be pinned in the first place), but they do implement Deref, because immutable &T references provide no way to move the values.


  1. However, there are non-unsafe ways to define sensible !Unpin types, too; one is that all async {} blocks or async fn futures are compiler-generated anonymous types that are !Unpin (and !Copy) and make use of this property in their implementation; the other is that types that wrap other !Unpin types can make use of “structural pinning” of fields without explicit unsafe through helper macros like the pin-project crate. ↩︎

2 Likes

So you're saying that all types in Rust are movable by default and automatically implement the Unpin trait.

For types that are not movable, you manually implement the !Unpin trait.

What is the purpose of these two traits? Does it mean that the compiler will give an error if it detects moving of non-movable data?

If that's the case, what is the significance of wrapping it with Pin?

As mentioned by the previous user, this is how Pin<Box> already works. The important thing that Pin<Box> does, which Box does not, is to prohibit moving out of the Box.

How does Pin ensure that data cannot be moved out?

So, how does Pin ensure that the pinned data is not moved?

Pin is two things here.

First, when T: !Unpin, Pin<SomePtr<T>> never hands out a &mut T except as an unsafe operation. This way, the owner of a Pin<SomePtr<T>> cannot use std::mem::swap or similar to move the pinned data. @steffahn just wrote more about that above.

Second, the act of creating a Pin (also unsafe when T: !Unpin) is a promise by its creator that the data is not going to be moved. For example, Box::pin() is safe, and creates a Pin<Box<T>>; it does not allow T to be moved because in general Box is always a unique owning pointer, so there is no way to get at the T except through the Box pointer. Thus, Pin<Box<T>> is both unique like Box, and prohibits the ways you might move out of Box because it's wrapped in Pin (so you cannot call Box::into_inner() or anything else that would move out.).

Pin doesn't do much; it is a type that denotes a prior promise, and its methods are designed to not break that promise. When you see a Pin<SomePtr<T>> you can rely on that either

  • T: Unpin (read this as "T does not need to be immovable; you are allowed to unpin it"), or
  • Somebody called Pin::new_unchecked in order to make the promise of not moving.

Pin is a way to communicate that promise from the creator of the Pin to the user of it.

3 Likes

No, it’s purely done in library, without compiler support for enforcing anything. For example the library function for obtaining &mut T from Pin<Box<T>> has a T: Unpin bound, the library function that gets you a Pin<&mut T> doesn't.

The reason why it’s done purely in libraries I think is mostly “because we can”, i.e. it was possible to do it with an API defined in the standard library, so avoiding the need for extra complication of the language was possible.[1]

The purpose of Unpin (which is only one trait; I’m writing !Unpin to mean “doesn’t implement Unpin”) is of course a bit curious; why make it so that types can be put behind Pin but then they aren’t actually pinned? The answer is: it’s very common that Pinned references (particularly Pin<&mut T>) has to be created in order to conform to a particular generic interface. First and foremost, and the very thing Pin was invented for, this is the Future trait with its fn poll(self: Pin<&mut Self, …) -> … method.

This method signature uses Pin<&mut Self> instead of &mut Self to allow the common use case of futures involving async fn or async block with self-referencing, unmovable-after-first-poll data types. However it doesn’t intend to force pinning on users that have Future types that don’t need the pinning, hence the design of Pin is such that a type that opts out of pinning with the (often automatically generated; so arguably, it’s rather an opt-in for pinning, instead an opt-out) implementation of Unpin can be freely converted between Pin<&mut T> and &mut T (the direction I had not yet mentioned is Pin::new to create, for example Pin<&mut T> from &mut T when T: Unpin.

So TL;DR, the purpose of Unpin is to allow API like the Future trait to still have the right actor to choose whether pinning is used, that is, the author of the type that might require pinned values for implementing its functionality.

Also, this demonstrate well why Pin is so hard to learn. It is a concept that is generic in two axes

  • it’s generic over the type of pointer being used, which is actually a higher-order kind of generics, sort-of, because it involves a generic parameter where you input types that themselves usually have a generic parameter
  • it’s generic over whether or not pinning actually takes place, which is determined by Unpin implementations of the target type

If you imagined we have two pinned-pointer types that always actually disallow moving, like PinnedBox<T> and PinnedMutRef<'a, T>, then you can translate the most use case of Pin like

type in Rust its meaning, if   Foo: Unpin its meaning, if not   Foo: Unpin
Pin<Box<Foo>> essentially just a   Box<Foo> actually pinned   PinnedBox<T>
Pin<&'a mut Foo>  essentially just a   &'a mut Foo  actually pinned   PinnedMutRef<'a T>

  1. Of course, this also has downsides, for example structural pinning / pinning projections, which I’ve mentioned in a footnote in my previous answer, requires either unsafe code, or help from some macros hiding unsafe code, instead of having the compiler directly “understand” what you’re trying to do, which can be less ergonomic. ↩︎

4 Likes

So I summarized it. In Rust, there are two things: Unpin (movable) and Copy. In Rust, there is a rule that if all fields of a struct implement Unpin, then the struct is Unpin, otherwise it is !Unpin (fixed). The same rule applies to Copy.

Furthermore, Rust has an empty struct called PhantomPinned. It implements !Unpin (fixed). So when we add a field of this type to our struct, it means that our type is not movable (although it can still be moved if it implements Copy). This is just a representation, not a requirement (like a marker). If it does not implement Copy, then moving will occur (when assigning or passing parameters).

There are four possibilities when putting data into Pin. The data may implement Unpin and may implement Copy. It may implement Unpin and !Unpin. The same goes for Copy. So it's 2 * 2 = 4.

As follows:

  The following table shows the meaning of Pin being fixed. It does not refer to the Pin struct itself.

Pin   Copy
 0      0           With DerefMut, dereferencing results in an error. No Copy.
 0      1           With DerefMut, dereferencing makes a copy of the data.
 1      0           Without DerefMut, dereferencing results in an error. No Copy.
 1      1           Without DerefMut, dereferencing makes a copy of the data.

The key point is when Pin is 1 and Copy is 0 or 1.

When the data is fixed, the Pin struct does not provide a deref_mut method because it is constrained by the generics. Only types that implement Unpin have this method, while !Unpin does not.

But why does not providing the deref_mut method ensure that data implementing !Unpin cannot be moved? This is because Rust has a function: std::mem::replace(&mut T, data) or swap() function. The replace function writes the data into the &mut T position. And this function is safe, not unsafe. So, if the data wrapped by Pin is !Unpin, and if we can call the deref_mut method, we will get a mutable reference of &mut T. We can pass this reference to the first parameter of the replace function, which returns the old value. Then we can assign the old value to other variables. At this point, the position has changed, and ownership has been moved. It can be observed that as long as there is a mutable reference, position movement can occur. Therefore, Pin must solve this by not providing a return of &mut T, which means not providing the deref_mut method. This is achieved by constraining it through a trait bound. Only types that implement Unpin (movable) have this method. Naturally, types that implement !Unpin do not have this method.

So, Pin ensures that immovable data is not moved by not providing a deref_mut method. However, if movable data is inside Pin, it depends on the developer's decision. You can choose to move it by calling deref_mut to get a mutable reference and then using replace or swap to move it. This will result in a change in the position of the variables in memory. Or you can choose not to move it by not calling deref_mut.

Thus, Pin does not forcibly guarantee that the data enclosed by it cannot be moved. It only ensures that in the case of !Unpin, the data cannot be moved by not providing the deref_mut method. In other cases, it is up to the developer to decide.

Another question is why Pin must contain a pointer. This is because the pointer allows the data to be obtained through deref_mut and then swap can be called. This way, the data can be moved. There might not be a smart pointer that decides whether to provide the deref_mut method based on whether the data implements Unpin. If there is one, it would be at the same level as Pin.

So, understanding these principles, would you like to create your own Pin implementation?

1 Like

But if this data implements!UnPin (fixed) and Copy.

Then you can dereference and copy the data, but it doesn't mean the ownership of the data has moved. The copied data still remains in the same location. It just creates an additional copy of the data.
Does this extra data have any unsafe situations? I tested it, and it seems not because I cannot drop the copied data. When I pass the data to drop, the compiler actually secretly copies the data, so what I'm killing is actually the copy.
So, this is as far as I can test.

yes or no ????

yes or no ??

yes or no??

Please don't spam, the community is very responsive without the need to make repeated demands for answers to your question.

6 Likes

Not quite. Copy implementations are explicit or derived, it’s not an auto-trait; though it’s not an ordinary trait either: you can only manually implement it if all fields implement it.

Dereferencing is not necessarily an error without Copy. Only if you access the refererenced thing by-value. So e.g.

let by_value = *x;

might error, but

let by_mutable_ref = &mut *x;

not necessarily.

Also if the type does implement Copy, dereferencing only makes a copy if you access the dereferenced expression by-value. (And in this case, the thing does through Deref, not DerefMut, because copying only needs immutable access.)

That’s exactly right!

The whole picture is a bit bigger. It’s essentially about all API that could end up moving the value (or providing a mutable reference); deref_mut and mem::replace/mem::swap are just prominent examples, and a case covered in the API of Pin itself. Other examples include Pin::new which is restricted to Unpin, because otherwise, you can re-borrow a &mut T, put it into Pin<&mut T>, but then you still have the original &mut T after the Pin<&mut T> is dropped, providing a way to move out of data that once was declared/promised as “pinned”. It’s an overall API promise, a contract of sorts, that the data cannot be moved, and users of unsafe Rust should ensure not to break this contract.

(It’s also a contract between the one defining a datatype and the one using it, and the one defining the datatype can decide to move the struct, or parts of it (understanding structural pinning / projections is a useful thing for a deeper understanding here), as they know the underlying reason for having the pinning restriction on their type.)

The answer to this that I would give is two-fold.

On one hand, pinning data (i.e. making sure it cannot move) must include a pointer indirection, as you can always move the Pin<…> handle you have gotten, and its contents must not move, so they must be behind pointer indirection. On the other hand, on the question why you need to put a pointer type argument into Pin, like Pin<Box<T>> or Pin<&mut T>, instead of Pin itself being defined to be a pointer, is for generality / a smaller API. The alternative would be that each pointer type gets its own pinned version, like the PinnedBox<T> or PinnedMutRef<'a, T> I have sketched above. This is a larger API surface (more new types), and instead having a single wrapper-type Pin that turns a non-pinned-pointer-type into a pinned-pointer-type is quite elegant.

4 Likes

Dropping is always irrelevant (a no-op) for data that implements Copy. So concerns of “dropping a copy” are irrelevant, as dropping never does anything.

Similarly, thinking about “ownership” being moved or not is not the most useful thing for Copy types, because types implementing Copy are arguably the least “owning” kind of types in Rust. I would still agree with the statement

but partially for the reason that there isn’t any ownership that could be moved to begin with. A “move” operation in Rust isn’t much more in the first place than

  • copying the shallow data
  • never using the old value again
  • making sure no drop implementation is called on the old value either
    • this is usually done by the compiler with static analysis or generated dynamic “drop flags”
    • but if you resort to manual usage of ptr::copy-like operations and unsafe code, you might encounter the need to reason about these conditions manually

given Copy types are always free of any drop code, you can hence logically (pretend to) “move” a Copy type by just copying it and never using the old value again.