Relative Pointer: an abstraction to build movable self-referential types

https://crates.io/crates/rel-ptr

Use relative pointers to build safe movable self-referential types or minimize the size of your pointers!

Example of self referential struct

use rel_ptr::RelPtr;

struct SelfRef {
    value: (String, u32),
    ptr: RelPtr<String>
}

impl SelfRef {
    pub fn new(s: String, i: u32) -> Self {
        let mut this = Self {
            value: (s, i),
            ptr: RelPtr::null()
        };
        
        this.ptr.set(&this.value.0).unwrap();
        
        this
    }

    pub fn fst(&self) -> &str {
        unsafe { self.ptr.as_ref_unchecked() }
    }

    pub fn snd(&self) -> u32 {
        self.value.1
    }
}

fn main() {
    let s = SelfRef::new("Hello World".into(), 10);
 
    assert_eq!(s.fst(), "Hello World");
    assert_eq!(s.snd(), 10);
    
    let s = Box::new(s); // force a move, note: relative pointers even work on the heap
     
    assert_eq!(s.fst(), "Hello World");
    assert_eq!(s.snd(), 10);
}
11 Likes

Nice!

Definitely interesting. Although given the amount of optimizations Rust may do regarding struct layouts, I wonder about UB lurking nearby. It's UB to use with a #[repr(packed)] struct, but I wonder if, as long as it is not the case, is it safe, then? Maybe we would need #[repr(linear)] or #[repr(C)]?

(I apologise if you have tackled this issue in the doc, I have just had the time for a very lossy fast sweep of your crate).

But I definitely like the idea :slight_smile: : for self referential structs it looks way more lightweight than Pinning. :sweat_smile:

1 Like

Since I like the idea a lot, I would like to help you in tackling the unsafe-ty and documenting the correct usage of your crate :wink:

So I've decided to do look at the code more closely:

    pub fn set(&mut self, value: &T) -> Result<(), I::Error> {
        // <snip>
    }
// <snip>
    pub unsafe fn as_mut(&mut self) -> Option<&mut T> {
        // <snip>
    }

This, for instance, is UB, even when the pointers are valid: you are effectively transmuting your initial value: &T into a &mut T.

To fix it, I think you'll need two RelPtr wrappers, one initialized with a &T and lending &Ts only, and another one, say RelPtrMut, initialized with a &mut T and thus allowed to lend &mut T (conversion from the latter to the former would be allowed, but not the other way around).

6 Likes

Ok, that's a good catch, only 1 thing to note is that RelPtr is more like a raw pointer than a reference, so even if I make that distinction, it would be trivial to make aliasing mutable pointers.

No UB related to struct optinizations because the only thing I rely on is that layouts can't change at runtime.

I'm pretty sure that with #[repr(packed)] they can :grimacing:

Yes of course, but your current interface is the one of a *mut T, so it should be inited by a &mut T. You would gain some extra benefits, also. By taking a &mut _, you could be sure that the offset is non zero and use NonZeroI8. You could then wrap it into an Option and thus be able to keep track of its initialised state, or just let users do that, and get enum layout optimization :slight_smile:

One question remains, though: @RalfJung how come ::core::ptr::NonNull<T>, which has .as_mut() implements From<&'_ T>? Shouldn't there at least be some warning in the documentation against the combination of the two?

Good point. NonNull::from(&x).as_mut() is definitely UB in my model -- this creates a mutable reference from a shared one.

2 Likes

No. repr(packed) just means no padding gets added, that's all.

For DSTs, the offset is only known at run-time, but it will never change for any given object (that'd require adjusting the data). So if you have (u8, dyn Debug) then the offset of the 2nd field depends on the alignment of the actual type of the second field, which is determined through the vtable.

I was referring to:

So, since it should be noted that field addresses (and thus "offsets") change on drop for #[repr(packed)] structs, I think there should be an important warning ("use at your own risk") against using #[repr(packed)] with self-referential structs + custom Drop relying on the RelPtrs being still valid.

Looks like that happens after Drop, which means RelPtr is still valid. It wouldn't be invalid to do things at Drop. There is some subtlety though, so I will document that.

It would be easy to plug in NonZero* types into my system, but that would require nightly until the next release cycle. I'll put it behind the nightly flag for now, and start prototyping the api for NonZero* types. I don't want to pull in a new crate that will be dropped at the next release cycle.

1 Like

You misunderstood. The struct offsets don't change. How would that even work?
What happens is that to drop the fields, these are moved to their own, properly aligned location. They are no longer part of the struct at that point.

Although clearly an antipattern (the drop logic should belong to SelfRef instead of Thing), I was imagining someone doing something along these lines:

Since it wasn't really clear to me when the "moving things around on drop" was actually taking place, it seemed to me that there could have been an issue with that kind of code. Now that I see that the moving really takes place "afterwards", I agree that there can't be an issue with relative offsets. My bad.

1 Like

A simpler case would be

#[repr(packed)]
struct Base(String, UnsafeThing);

struct UnsafeThing(RelPtr<String>); // points into Base

impl Drop for UnsafeThing {
    fn drop(&mut self) {
        // ... accessing `RelPtr<String>` here is UB
    }
}

Yep, that's what my example is about, with some ffi-flavour added to it to justify the self-referential struct and also using ManuallyDrop to enforce drop order, else it's clearly UB.

Are you sure Rust won't ever make an optimisation like that? I could imagine sometimes being better to generate one version of the struct for „normal“ storage (eg. inside a Vec or variable) and another version that is stored inside some enum or bigger struct (maybe even „scattered“ around other fields). Then rustc could generate two versions of eg. Debug::fmt implementations, depending on what version is there and having conversion code when moving the struct into/outside from the bigger struct.

I think the current Rust doesn't promise to not do such optimisation. After all, repr(rust) is unspecified ‒ which probably means it's unspecified if its the same in all contexts.

Yes, lets look at an example.

let x = vec![Stuct { ... }]; // this type doesn't really matter, it just shouldn't be `Struct`

let y = Struct { ... };

let z: &Struct if random {
    &x[0]
} else {
    &y
};

If the layout of Struct could change at run time, then dereferencing z would be UB or z would not be the same size as a normal pointer, both of which are extremely undesirable. Rust has the guarantee that references to Sized types will be the same size as a normal pointer and no UB in safe code.

Because of this it must have the guarantee that layouts cannot change at runtime.

For all other types,

Slices have gaurenteed layout. As does str, and traits objects have a type which they are guaranteed to be layout compatible with on nightly.

All other types are not supported unless support is explicitly added to them by the users of my crate.

https://doc.rust-lang.org/std/raw/struct.TraitObject.html

From the docs:

For those familiar with pointers, a reference is just a pointer that is assumed to not be null. In fact, Option<&T> has the same memory representation as a nullable pointer, and can be passed across FFI boundaries as

1 Like

All right then. :slight_smile: We should probably try to clarify that sentence in the documentation, though.

We talked about layout guarantees in the unsafe code guidelines WG, and nobody suggested layout could change at run-time... and I think the only reason it wasn't ruled out is that we were not creative enough.^^ I am going to open this as a question over there.

1 Like

That is not how optimizers work. A lot of optimisations are/can be applied only if the compiler can prove something doesn't happen. So, if the compiler could prove (from the code and guarantees promised) that the two variants of the reference never meet at the same place, it could do such optimisation ‒ like, it could basically generate two different types if the only place they meet is some kind of assignment, where it would convert them.

I'm not saying it is likely someone will do such optimisation. But the fact that some code would prevent the optimisation isn't reason to state that such optimisation can never be done.

For example, a C compiler could reorder struct fields if it can prove that nobody ever takes pointer of them or the struct, the code in current .c code would not break by that and that nobody outside of the .c module would ever see any variable of that type.

Actually I think that cannot work. An &mut T could point to inside an enum or a bigger struct or a Vec, and yet it must always work the same for a fixed T. Layout cannot depend on the context in which a type is used.

EDIT: Oh sorry, this has basically already been said.

Well, but such an optimization would also be compatible with a relative pointer as proposed here. So it is fine that code assumes that fields have a consistent offset. Compiler optimizations can only change stuff around if they have proven that there is no code that could exploit this.