Is this self-referential struct unsound?

Hi, I am trying to grasp why the following may need / must use a Pin, but I am not being able to follow the though process.

#[derive(Debug)]
struct A {
    a: Vec<u32>,
    b: *const u8,
}

impl A {
    fn new(a: Vec<u32>) -> Self {
        let b = a.as_ptr() as *const u8;
        Self {
            a,
            b,
        }
    }
    
    pub fn as_slice(&self) -> &[u8] {
        unsafe { std::slice::from_raw_parts(self.b, self.a.len() * 4) }
    }
    
    pub fn vec_ptr(&self) -> *const u32 {
        self.a.as_ptr()
    }
}

fn main() {
    let mut a = A::new(vec![1, 2]);
    let mut b = A::new(vec![3, 4]);
    
    println!("a: {:?}, {:?}", a, a.vec_ptr());
    println!("b: {:?}, {:?}", b, b.vec_ptr());
    println!("swap");
    std::mem::swap(&mut a, &mut b);
    println!("a: {:?}, {:?}", a, a.vec_ptr());
    println!("b: {:?}, {:?}", b, b.vec_ptr());
    drop(a);
    println!("b: {:?}, {:?}", b, b.vec_ptr());
}

Output

a: A { a: [1, 2], b: 0x55954512ead0 }, 0x55954512ead0
b: A { a: [3, 4], b: 0x55954512eaf0 }, 0x55954512eaf0
swap
a: A { a: [3, 4], b: 0x55954512eaf0 }, 0x55954512eaf0
b: A { a: [1, 2], b: 0x55954512ead0 }, 0x55954512ead0
b: A { a: [1, 2], b: 0x55954512ead0 }, 0x55954512ead0

The prints indicate that there is consistency between the pointer A::b and the corresponding address A::vec_ptr, no?

(For the purposes of this exercise, we can assume that A::a is immutable; i.e. it does not grow or shrink)

(Playground)

You don't need to use Pin here because the struct can be freely moved around, since a Vec is heap-allocated (so the address of its data does not depend on the address of the struct). I'm pretty sure this code is sound - but be careful. If you used Box<[u32]> instead of Vec<u32> it might be unsound as Box requires its pointer not to alias, but I don't understand it enough to say for sure.

It is currently impossible to write sound code that relies on Pin for safety, due to the aliasing rules and stacked borrows.

This isn't safe. For example, if Vec grows and has to reallocate, it will have a new address for the data on the heap and your pointer will still point to the old address.

2 Likes

Thanks a lot. That is a great point. I edited the question because my concern was on the self-referential aspect; in my use-case we can assume that the Vec is immutable.

The question emerged because we are trying to remove a level of indirection, by changing Arc<Vec<u32>> to Vec<u32> (still immutable), and are evaluating whether doing this with a pointer around raises a soundness concern.

I'm still not sure that's safe. I'm not sure there are any guarantees that a Vec can't be moved in memory even if the data isn't changed, but someone who's more knowledgable may chime in on that.

If you're trying to remove a level of indirection and have immutable data, how about Arc<[u32]> instead. That should be easier and safer to work with.

Yes, this is safe with the public interface you have exposed here.

Note that creating a &mut [u8] to the data in the vector would invalidate your b pointer for the reasons described here, even if this doesn't cause it to reallocate. For this reason, I would argue that it would be better to design it such that as_slice called a.as_ptr() each time.

4 Likes

I don't think you should be multiplying the length by 4.

The multiply by 4 is ok because the types are u32 and u8 respectively. (and u8 has a smaller alignment than u32)

1 Like

Oh, that makes sense.

The bytes crate gives you a type, Bytes, that's in essence a multiply-owned, thread-safe, immutable byte slice without extra indirection and with a safe interface; you could make your type a wrapper around Bytes and expose accessors that let you read/write u32s from/to it.

Edit: Actually, if you're using Vec<u32> to make sure the pointer is 4-byte aligned, I'm not certain that can be achieved with Bytes.

Is it the guarantee that Vec won't reallocate unless pushing to it when len == capacity that causes it to be safe to keep around a pointer to the data as long as the original Vec is alive (and as long as a mutable reference isn't taken)?

Yeah, vectors are guaranteed to only reallocate if you call a method that would increase its length past its capacity (or the reserve methods).

1 Like

If the data is immutable, you could remove the extra level of indirection by using Arc<[u32]> instead:

Arc::from(a.into_boxed_slice())
1 Like

A potential problem with this is that it copies the data (see here), which is required because the backing allocation for Arc<T> has a different layout from that of T. Unfortunately (afaict) there's no simple way to get some data into an Arc<[u8]> or similar without copying it in the process.

1 Like

It would be nice if Vec and Arc could at least put their payloads at the same offset from a memory page boundary: That copy would then be reduced to a simple page table update.