Pin use in Futures::poll

FrankReh · August 23, 2022, 2:10pm

I'm trying to understand why the Futures::poll method uses Pin for the self argument.

Is there a blog or other document put out by the team that worked on async back in 2018 and 2019 that explains it?

I've been reading about poll and Pin and found lots of explanations about being able to recognize at compile time why something can be safely moved; think I understand the guarantees the pin crate makes for pointers, and I've read a lot that goes back to 2018 and 2019, and even why a few small unsound windows remain, but my understanding is missing something. Why is the restriction/safeguard passed into the poll method?

I can understand why an async runtime should not move a future around, the future might contain self references. Certainly many futures built by the compiler for async blocks and async functions can contain self references and user futures might as well.

But I don't see why the Pin used in the Futures::poll signature is a safeguard. The implementor of a particular poll method is always on the hook to write the method so it is consistent with their goal for the future and the poll API doesn't let the method change the address that the runtime knows the future by. So how does the compiler guaranteeing that the address given to the poll method is pinned help the person writing the poll method or compiling the code? That person has to rely on the runtime being well behaved in many other ways too, not just that it doesn't move futures it has been given via await and spawn calls.

I can almost see that a generic future might not know at the time it is being written whether all the types it will be used for are safe to move, but even then, I don't see how a poll would be written where a move of the data pointed to by the poll's self could make any sense as the runtime would still know the future by the address it had for it internally. Even the awake callback doesn't take the address, it's a closure on the task's address presumably.

Sorry my question is so long winded. I've read so many explanations about pinning a pointer and why a future with a self reference shouldn't be allowed to move - and I think I understand those. I just don't see why a poll method would be written to move a future, and if it did, how a runtime was supposed to know about the new address anyway.

(Here's a last minute thought: is it related to what might be put into the Poll::Ready enum as that involves a move or a copy?)

2e71828 · August 23, 2022, 2:35pm

These two are closely related: No self-references actually exist until the future starts doing work, which happens inside the poll implementation. Before this point, it’s perfectly safe to move the future around in memory.

It is only sound for the poll method to calculate and store these self references if it has a guarantee that no external code will move the future, and that guarantee is provided by the Pin that’s passed in— The external code Pins the future when it has been moved to its final memory location, and the Pin mechanics ensure it isn’t accidentally moved after that point.

kpreid · August 23, 2022, 2:36pm

The Pin in Future::poll's type guarantees to the future that it won't be moved between polls. It prevents the future from being polled (by any code that chooses to call Future::poll^[1]) unless it is pinned, so that the code in the implementation of poll() can rely on not being moved.

Remember that poll() is an ordinary method, not even unsafe, and can be called by anything, and that “async runtime” is not a privileged concept as far as the compiler and memory safety rules are concerned ↩︎

alice · August 23, 2022, 5:42pm

The code that pins the future uses unsafe to create a Pin<&mut F>. The code that creates the self-referential stuff looks at the Pin<&mut F> it was given and knows that someone used unsafe to create it.

jonh · August 23, 2022, 11:49pm

It's for performance. Without pin there would be added runtime cost. A Future you spawn is typically a chain of Futures each with a poll method and their own memory. The pin ensures the entire lot gets fix location otherwise each would be using it's own boxing to stop movement. (Think had something to do with embedded use too.)

The lines were (to me) clearer in the 0.1 days before async/await and Futures are close in style to Iterators. The helper functions never made it into the standard so some reside in FutureExt.

(Like other mention) the safeguard in writing your own poll is you may end up needing variables that are not Unpin, in which case you have to use unsafe code block, which should then receive extra review. (Not that many will me writing poll directly.)

FrankReh · August 24, 2022, 12:09am

Thank you. Four pieces to the puzzle I hadn't been considering.

So the Pin in the signature is a promise, enforced by the compiler, that the data won't be moved between calls to poll. (I don't have a good feeling why such a promise is important when so many other things we program against are based on the semantics of the API as described in documentation alone - but I digress.)
The self referential pieces of interest here are actually setup in the first or subsequent calls to poll, not before the first poll. So again, any self references set up in one call to poll would still be valid in subsequent calls to poll for the same future F.
Unsafe was used to create the pin. That's interesting but I don't see how that helps the code that is creating the self-referential stuff. But it sounds like it helps more than just having a documented promise that the runtime won't be moving the future now that poll is being called. (Maybe I'm still missing something that is subtle but it could also be staring me in the face now and tomorrow I will see it.)
And performance gains of a single memory allocation for a chain or tree of futures. If that is true, I'm confused a little further because it seemed the future needed to be pinned before passing it to the runtime. I had seen the pin! macro used in some discussions of how to create a future manually before passing it to the runtime but I'm not really sure about that at this point. It sure would be nice to have a picture created of how an entire lot of future locations are fixed by the runtime.

@alice I think my biggest remaining question is how or why knowing the caller used unsafe to create the pin helps.

kpreid · August 24, 2022, 1:07am

It is not enforced by the compiler, exactly. Rather, by creating a Pin<P> (where P is some pointer type), the creator makes a promise that it's not going to move the referent of P. That creation is done using an unsafe function; the reason it is unsafe is that the compiler cannot check the promise that the referent of P won't be moved. (Not every use of Pin requires writing new unsafe code; for example, you can call Box::pin() to create a Pin<Box<T>>. The Box is doing the promising, here.)

Once the Pin is created, the part that the compiler does track is the fact that the type is Pin<P> — ordinary type checking and type inference — which has the effect, due to what functions are available to operate on Pin, of ensuring that the referent can't get “unpinned” and move (unless it is safe to do so).

(I don't have a good feeling why such a promise is important when so many other things we program against are based on the semantics of the API as described in documentation alone - but I digress.)

The key distinction is that the result of violating the Pin contract would be unsoundness (or memory unsafety) due to dangling pointers (pointers that no longer point to an existing allocation with valid contents). One of the foundational principles of Rust is that you will never encounter a dangling pointer unless some unsafe code did something wrong.

The self referential pieces of interest here are actually setup in the first or subsequent calls to poll, not before the first poll. So again, any self references set up in one call to poll would still be valid in subsequent calls to poll for the same future F.

Yes. For a concrete example:

let my_future = async {
    let mut x: Vec<i32> = get_data();
    some_other_async_fn(&mut x).await;
    println!("done!");
};

When my_future, the Future generated by this async block, is in its await, it has a self-reference: the future from some_other_async_fn (which is stored as part of my_future) contains a &mut Vec<i32> pointing at the local variable x (which is also stored as part of my_future). But, before the future has been polled, none of the code in it has run — get_data() hasn't been called and x doesn't exist yet — so there are no self-references and the future is safe to move.

(It's useful that futures are movable before being polled because, for example, any time you write a function that returns a future, that future is being moved.)

... It sure would be nice to have a picture created of how an entire lot of future locations are fixed by the runtime.

It's not the runtime's business; rather, every future that you write that uses other futures contains those futures. In my above code sample, my_future contains the future that some_other_async_fn returns. It's sort of like if a function pre-allocated all the stack space that would ever be necessary for all the functions that it will call.^[1] So, the size_of() that future is exactly that maximum space required, and the async runtime merely needs to allocate space for that (usually inside a Box or similar) and not move it.

This isn't a special thing about futures; it's exactly the same concept as if you write an enum A that contains another enum B — the size of A will be the size required to represent all possible states of A, including "contains a B" together with the B value. Then, given an &mut A, you can write any possible A into it.

"Wouldn't that prevent recursion?" Yes, it does! You will get an error if you write a recursive async function without special measures like Boxing a future. ↩︎

2e71828 · August 24, 2022, 5:54am

One way of looking at the whole unsafe system is in terms of assigning blame: When memory safety violations occur, which code (and therefore author) is at fault? In Rust, code that appears outside of an unsafe block is never the root cause of a safety violation; that has to come from either the compiler or a piece of unsafe code.

By using unsafe to pin a value, the author is volunteering to take the blame if the value later gets moved¹. In the case of futures, they can then do things like holding self-references which are only safe if there is a guarantee in place.

The alternative, as you point out, would be for Future::poll to be an unsafe fn with the not-moving restriction listed as a safety condition. There’s nothing inherently wrong with this approach, but it will mean programs have more unsafe blocks that have to be audited when weird errors occur. For example, the Pin system lets you write a proxy future that forwards its poll implementation to somewhere else without writing unsafe.

¹ Unless there’s another, more relevant, unsafe block in the picture.

FrankReh · August 24, 2022, 5:32pm

Thank, you! It took four of you spending time reading my question and understanding my confusion or what I didn't know. I will go out on a limb now and say it makes complete sense to me. Even all the discussions from years ago. Even the idea that knowing it took unsafe code to create the pin makes sense now.

The rust teams have created so many useful systems and protocols for getting things done and moving forward. Somehow a new person to rust was allowed to fall through the cracks though and read man pages and read source code and read discussions, without ever being made aware of the soundness guarantee the rust teams are always trying to provide with their language features and the libraries. Luckily for me, my question about why it wasn't enough to just document the proper use of a Futures::poll method showed my ignorance on the subject.

Everything I've read from the core team members makes more sense from the angle that they are always trying to guarantee a sound public API. The notion that a naive user can't misuse a library's API and cause undefined behavior is very important to everyone who is really working on rust. If folks want to use rust and not care about soundness in their own modules, that's entirely their choice. But the language and the std libraries are meant to be sound, by their definition, and that is very nice IMHO.

A new person can read a lot and not know what is meant by safe when it is used in discussions or responses. safe usually means not requiring unsafe, and unsafe is vitally important to understand if one is trying to follow along. And in the case of the Futures::poll method, being forced to use unsafe to call it means poll could be provided without violating the soundness promise. The method is sound if used properly, and having had to use unsafe means the caller of poll has taken the responsibility to use it properly.

system · November 22, 2022, 5:33pm

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.

Topic		Replies	Views
Why does Future::poll() consume the Pin? help	5	397	December 19, 2023
High level: pin, async, future	5	492	May 17, 2022
Polling async Function - No Method Named poll	13	4857	May 24, 2020
Question about some pinning code in futures-rs help	6	433	January 27, 2020
What happens when I move an object before pinning it? help	7	1047	December 24, 2021

Pin use in Futures::poll

Related Topics