Sort of moveable safe self-referential types based on offset that is known at compile time?

Hi everybody, I'm Daniel, I'm still new to Rust and have been spending lots of times reading and watching about async in Rust, and all that pin stuff

I'm not very versed in rust, but I have some background in, C, so I wrote some code to illustrate what I'm trying to explain here... the whole idea is around why we need Pins, and how do we really need they? Again I'm pretty new to Rust so I'm may be saying some non-sense here. Well, let's go ...

From old times when I used to do some C I have that problems when you want to share a pointer to the user of your library, but there is some data that you need to go together with the user data, but you want it to be private to the user. What we did at that time was allocating something on the heap and giving to the user some opaque pointer to some member of some struct, where the struct is our private data, public data is just the opaque pointer that we give to user...

Now, sooner or latter we will have to retrieve that private data, and what we did was, the user pass their pointer. Since the pointer is hosted in the struct, the address to the struct is just of function of the
address of the opaque pointer, so it doesn't matter where the opaque pointer goes, our private data is there, (as long as we take care of the moving part, this is the whole point of the pointer being opaque, don't letting the user to know it's memory layout, so he depend on us to handle/move/allocate/free it for him, so... )

I wrote some C code to show how this would be possible, https://gist.github.com/dhilst/d5f3ca90a12840b4f17981f4e15674bd

The point is, we know at compile time the offsets members of structs, can't we encode this in the type system so that we safe self referencing pointers as something like OffsetRef, I don't know, but that encodes that offset and dereferencing at compile time (Zero cost abstractions), is that possible?

It seems like you are proposing relative pointers, which I have made a crate for rel-ptr.

The reason why these can't replace Pin<_> is because of code like this:

let x: &T = if some_condition {
    self_ref
} else {
    not_self_ref
};

x can't be turned into a relative pointer, because even though that would be fine to do with self_ref, it would not work withnot_self_ref because self can be moved independently of not_self_ref's pointee. This means that not_self_ref would be invalidated on move if self was moved and it was a relative pointer.

Now, about rel-ptr. It doesn't encode the offsets at compile time, and it doesn't need to, but I do have another (unreleased) crate generic-field-projections that can encode such properties at compile time (but it is far more limited than rel-ptr). This one only deals with fields directly, and was more of a proof of concept than anything I would actually use.

rel-ptr stores an integer offset and uses it's current position to figure out where the pointee is. Much like your C code. You need to setup the relative pointer by passing it a reference or a raw pointer to the pointee and then you are good to go. It doesn't matter how much you move the type, the relative pointer will always be valid.

Hi!

So thanks for the rel-ptr, I didn't know it... Now yeah, we can't use the same type to represent offset and address, since offset need an address to make some sense, so they are definitely not the same.

About Pin, I can't fully understand why is needed, is it about the state that is need to hold between future polls, !? Aren't they too restrictive?

It to prevent the future from moving between polls. This is important because in the presence of clever unsafe code, we can't know what's representing a pointer and what isn't. (RelPtr is a nice example of this. In all rights, it's literally an integer, but is also a pointer). In this way we can't just replace all pointers with relative pointers and be on our way, because that would violate a large chunk of perfectly good unsafe code.

So the only other option is to enforce immovability. This would have been best done through another trait in a similar vein to Copy, let's call it Move. Now, in order to preserve backwards compatibility, Move would be default implemented. This require a lot of churn for generic libraries because they weren't designed with !Move types in mind (because they didn't exist). This would risk splitting the community right down the middle.

So in order to side-step these duanting issues, Pin was born. It has a clever api that gave just enough guarantees for async to work, and almost nothing more. The cleverness comes from it's usage of unsafe to enforce the necessary guarantees.

tldr: the only way to support both unsafe and async is to have immovable types. But Rust thought true immovable types would cause too much churn and may split the community, so offered an unsafe primitive that would give most of the same guarantees. relative pointers don't allow the full breadth of possibilities as Pin does.

Yeah, it's a tiny crate. I would have been surprised if you did know it.

1 Like

Hmm, can you point me out where I can read about this unsave vs async stuff?

Thank you! Cheers!

Since async is relatively new, there isn't much material on it just yet. I don't remember where I saw the exact issue for unsafe vs async, but something like as simple as a usize that encoding a pointer can throw off this analysis, so I don't think auto-converting pointers to relative pointers is viable.

1 Like

Posting this to give more examples, in case that helps! Might not, but just in case, to illustrate.

I think the key feature we want is to be able to write an async fn which saves arbitrary rust data structures across .await points. This code:

async fn x() {
    let source: Xxx = some_func();
    let borrow: Yyy = source.borrow_func();
    other_fut().await;
    use_value(borrow);
}

should compile for any data structure Xxx and borrowed value Yyy. If Xxx is u8 and Yyy is &u8, this is easy. But it gets harder if Xxx is an arbitrary data structure, and Yyy isn't a direct reference.

For instance, what if I have a Vec<u8> and &u8 reference to the first element? This is easy, it should just be a regular reference because the elements are on the heap. But... does the compiler really know that?

Or, maybe more complicated, what if I have a smallvec::SmallVec<u8> and a reference to its first element? SmallVec is either allocated on the heap or inline data depending on its size, so it's non-trivial, or dependent on runtime data, to figure out.


It's true that we as programmers could write asynchronous code using relative pointers and get this to work. But as a language, we don't just want to enable handwritten async code, but code which flows logically and can use Rust's full set of abstractions. With Pin, we don't have to care or introspect at all, and people can write async code without having to care or be careful about borrows. It just work.

There's a small cost, but it's not even that large of a cost overall. Heap allocation trivially fulfills Pin, and async fn/.await internals can write safe embeddings of other futures so there only needs to be a heap allocation at the top level. I'd say it's a win-win?


For a lot more of this kind of reasoning, I recommend reading through withoutboat's blog series on designing pin, and async/await in general. First post is here.

3 Likes

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.