Are sound self-referential structs possible without boxing?

alice · January 13, 2021, 5:12pm

The following example of how one might try to implement a self-referential struct is unsound.

use std::cell::UnsafeCell;
use std::pin::Pin;
use std::marker::PhantomPinned;

struct SelfRef {
    value: UnsafeCell<u64>,
    ptr: *const u64,
    _pin: PhantomPinned,
}

impl SelfRef {
    fn new(value: u64) -> Self {
        Self {
            value: UnsafeCell::new(value),
            ptr: std::ptr::null(),
            _pin: PhantomPinned,
        }
    }
    fn set_ptr(self: Pin<&mut Self>) {
        let me = unsafe { Pin::into_inner_unchecked(self) };
        let ptr = &mut me.ptr;
        let value = me.value.get();
        *ptr = value;
    }
    fn get_value(self: Pin<&mut Self>) -> u64 {
        let me = unsafe { Pin::into_inner_unchecked(self) };
        assert!(!me.ptr.is_null());
        unsafe { *me.ptr }
    }
}

fn main() {
    let self_ref = SelfRef::new(10);
    futures::pin_mut!(self_ref);
    
    self_ref.as_mut().set_ptr();
    println!("{}", self_ref.get_value());
}

playground

If you run the above with miri, you will see that it encounters UB. As I understand it, this is because when we call get_value, we create a unique reference to SelfRef, and hence we assert exclusive access to the struct, including all of its fields. This invalidates the *const u64 we stored in the struct, as it was created before we asserted unique access to value.

Note that in the above I use that an &mut UnsafeCell<T> still asserts exclusive access to the T. As far as I can see, this is indeed the case. The assumption that UnsafeCell turns off is the opposite, namely that an &UnsafeCell<T> does not assert that the T is immutable.

What does the compiler even do when it constructs a future with async/await? As far as I can see there is simply no way to fix the above code.

(Note that if we allow boxing of the value field, it is ok as the &mut SelfRef does not assert exclusive access through raw pointers recursively.)

jschievink · January 13, 2021, 5:17pm

I believe this is being discussed in https://github.com/rust-lang/rust/issues/63818, if I'm not mistaken

steffahn · January 13, 2021, 5:36pm

As far as making miri happy is concerned, something like

fn get_value(self: Pin<&mut Self>) -> u64 {
    let me = unsafe { self.get_unchecked_mut() };
    let ptr = me.ptr;
    assert!(!ptr.is_null());
    let _me = me as *mut _;
    unsafe { *ptr }
}

is enough.

Furthermore, finding an approach that would make my personal intuition of “sound unsafe code” happy, one could go this far:

fn get_value(self: Pin<&mut Self>) -> u64 {
    let me = unsafe { self.get_unchecked_mut() };
    let ptr = me.ptr;
    assert!(!ptr.is_null());
    let _me = me as *mut _ as usize;
    let ptr_usize = ptr as usize;
    unsafe { *(ptr_usize as *const u64) }
}

Also, I really have to put reading into the details of stacked borrows higher on my TODO list.

alice · January 13, 2021, 5:42pm

Yeah, but that's just this again. It would still be unsound with LLVM's noalias turned on for mutable references.

That doesn't seem sound to me. Maybe if you added ptr_usize - _me to the me pointer, it would be better.

steffahn · January 13, 2021, 5:49pm

Oh, I see, I hadn’t really read that post yet.

Does replacing ptr_usize with _me + ptr_usize - _me (all of these are integers!) really make a semantic difference? Also, this is somewhat of an LLVM question now, right? (Plus the question of what kind of LLVM Rust emits..)

SkiFire13 · January 13, 2021, 5:50pm

Box<T> contains an Unique<T> which should be special cased for this:

github.com

rust-lang/rust/blob/master/library/core/src/ptr/unique.rs#L11-L15


      
          /// Unlike `*mut T`, `Unique<T>` behaves "as if" it were an instance of `T`.
          /// It implements `Send`/`Sync` if `T` is `Send`/`Sync`. It also implies
          /// the kind of strong aliasing guarantees an instance of `T` can expect:
          /// the referent of the pointer should not be modified without a unique path to
          /// its owning Unique.

alice · January 13, 2021, 5:51pm

Right, to do it with boxing, you must Box::into_raw and store it as a raw pointer, which you manually drop in the destructor.

alice · January 13, 2021, 5:51pm

I really meant adding the difference between _me and ptr_usize to the original me pointer, avoiding the integer schenaniganz.

steffahn · January 13, 2021, 5:53pm

Ah, okay.. well, I was thinking ahead here to a situation where we don’t know anymore if ptr points directly to a field of me or to some (other) heap location instead. I can imagine self-referential structs where both possibilities can happen.

alice · January 13, 2021, 5:56pm

Just casting to integers and back to pointers seems incredibly sketchy to me. Anyway, even if we considered adding offsets to me for this case, there are other cases that also ought to work when pinned, but have the same issue without this workaround.

use std::cell::UnsafeCell;
use std::pin::Pin;
use std::marker::PhantomPinned;

struct PairSelfRef {
    value: UnsafeCell<u64>,
    ptr: *const u64,
    _pin: PhantomPinned,
}

fn link_up(left: Pin<&mut PairSelfRef>, right: Pin<&mut PairSelfRef>) {
    let left = unsafe { left.get_unchecked_mut() };
    let right = unsafe { right.get_unchecked_mut() };
    left.ptr = right.value.get();
    right.ptr = left.value.get();
}

impl PairSelfRef {
    fn new(value: u64) -> Self {
        Self {
            value: UnsafeCell::new(value),
            ptr: std::ptr::null(),
            _pin: PhantomPinned,
        }
    }
    fn get_value(self: Pin<&mut Self>) -> u64 {
        let me = unsafe { Pin::into_inner_unchecked(self) };
        assert!(!me.ptr.is_null());
        unsafe { *me.ptr }
    }
}

fn main() {
    let pair_1 = PairSelfRef::new(10);
    let pair_2 = PairSelfRef::new(20);
    futures::pin_mut!(pair_1);
    futures::pin_mut!(pair_2);
    
    link_up(pair_1.as_mut(), pair_2.as_mut());
    
    println!("{}", pair_1.get_value());
    println!("{}", pair_2.get_value());
}

playground

SkiFire13 · January 13, 2021, 5:58pm

I feel like this could have changed if UnsafeCell::get_mut wasn't stabilized.

Kestrer · January 13, 2021, 5:59pm

You may also want to use the aliasable crate which provides an AliasableBox so you can reduce usage of unsafe and not have to implement Send + Sync manually.

alice · January 13, 2021, 5:59pm

@Kestrer The case with boxing is not that interesting to me. I mostly want to know if there is any way to do this soundly without boxing.

Kestrer · January 13, 2021, 6:07pm

This issue is also being tracked in the UCG repository. I opened an issue similar to this on Miri where I managed to satify it using a similar workaround to @steffahn's, but RalfJung did mention that it wasn't a valid way to implement self-referential types. So I don't think this is possible soundly (even if Miri does not complain).

steffahn · January 13, 2021, 6:16pm

FYI, in that example’s API, one could drop one of the PairSelfRef before calling get_value on the other, dereferencing a dangling pointer.

alice · January 13, 2021, 6:17pm

Good point, but you can fix that with the proper destructors.

Kestrer · January 13, 2021, 6:23pm

I think it would still be unsound because destructors are not guaranteed to run. For example:

let mut pair_1 = Box::pin(PairSelfRef::new(10));
let mut pair_2 = Box::pin(PairSelfRef::new(20));

link_up(pair_1.as_mut(), pair_2.as_mut());

std::mem::forget(pair_1);

println!("{}", pair_2.get_value());

alice · January 13, 2021, 6:23pm

Pin guarantees that if the destructors don't run, then the memory is leaked and hence the pointer would not be dangling.

RalfJung · January 13, 2021, 6:48pm

FWIW, there's now a way to test this: the -Zmiri-track-raw-pointers flag. It will not work on code that use int-to-ptr casts, but for raw-ptr-only code it tracks provenance properly.

Of course that example code uses int-to-ptr casts precisely to get around the provenance tracking. This will have to be allowed somehow and it will be interesting to see what LLVM will do here...

You are very right to be cautious.

system · April 13, 2021, 6:48pm

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.

Topic		Replies	Views
Self referential struct with references help	2	419	April 24, 2024
Self-referential struct - request for a review help	4	461	October 23, 2021
Proper way to create self referential structs (with or without `Pin`) help	15	1263	December 20, 2021
Help needed: self referential struct help	7	973	January 28, 2020
Is this self-referential struct unsound? help	15	729	June 21, 2021

Are sound self-referential structs possible without boxing?

Related Topics