Is this self-referential struct unsound?

jorgecarleitao · March 22, 2021, 4:14pm

Hi, I am trying to grasp why the following may need / must use a Pin, but I am not being able to follow the though process.

#[derive(Debug)]
struct A {
    a: Vec<u32>,
    b: *const u8,
}

impl A {
    fn new(a: Vec<u32>) -> Self {
        let b = a.as_ptr() as *const u8;
        Self {
            a,
            b,
        }
    }
    
    pub fn as_slice(&self) -> &[u8] {
        unsafe { std::slice::from_raw_parts(self.b, self.a.len() * 4) }
    }
    
    pub fn vec_ptr(&self) -> *const u32 {
        self.a.as_ptr()
    }
}

fn main() {
    let mut a = A::new(vec![1, 2]);
    let mut b = A::new(vec![3, 4]);
    
    println!("a: {:?}, {:?}", a, a.vec_ptr());
    println!("b: {:?}, {:?}", b, b.vec_ptr());
    println!("swap");
    std::mem::swap(&mut a, &mut b);
    println!("a: {:?}, {:?}", a, a.vec_ptr());
    println!("b: {:?}, {:?}", b, b.vec_ptr());
    drop(a);
    println!("b: {:?}, {:?}", b, b.vec_ptr());
}

Output

a: A { a: [1, 2], b: 0x55954512ead0 }, 0x55954512ead0
b: A { a: [3, 4], b: 0x55954512eaf0 }, 0x55954512eaf0
swap
a: A { a: [3, 4], b: 0x55954512eaf0 }, 0x55954512eaf0
b: A { a: [1, 2], b: 0x55954512ead0 }, 0x55954512ead0
b: A { a: [1, 2], b: 0x55954512ead0 }, 0x55954512ead0

The prints indicate that there is consistency between the pointer A::b and the corresponding address A::vec_ptr, no?

(For the purposes of this exercise, we can assume that A::a is immutable; i.e. it does not grow or shrink)

(Playground)

Kestrer · March 22, 2021, 4:25pm

You don't need to use Pin here because the struct can be freely moved around, since a Vec is heap-allocated (so the address of its data does not depend on the address of the struct). I'm pretty sure this code is sound - but be careful. If you used Box<[u32]> instead of Vec<u32> it might be unsound as Box requires its pointer not to alias, but I don't understand it enough to say for sure.

It is currently impossible to write sound code that relies on Pin for safety, due to the aliasing rules and stacked borrows.

drewkett · March 22, 2021, 4:31pm

This isn't safe. For example, if Vec grows and has to reallocate, it will have a new address for the data on the heap and your pointer will still point to the old address.

jorgecarleitao · March 22, 2021, 4:47pm

Thanks a lot. That is a great point. I edited the question because my concern was on the self-referential aspect; in my use-case we can assume that the Vec is immutable.

The question emerged because we are trying to remove a level of indirection, by changing Arc<Vec<u32>> to Vec<u32> (still immutable), and are evaluating whether doing this with a pointer around raises a soundness concern.

drewkett · March 22, 2021, 4:55pm

I'm still not sure that's safe. I'm not sure there are any guarantees that a Vec can't be moved in memory even if the data isn't changed, but someone who's more knowledgable may chime in on that.

If you're trying to remove a level of indirection and have immutable data, how about Arc<[u32]> instead. That should be easier and safer to work with.

alice · March 22, 2021, 5:10pm

Yes, this is safe with the public interface you have exposed here.

Note that creating a &mut [u8] to the data in the vector would invalidate your b pointer for the reasons described here, even if this doesn't cause it to reallocate. For this reason, I would argue that it would be better to design it such that as_slice called a.as_ptr() each time.

gretchenfrage · March 22, 2021, 5:35pm

I don't think you should be multiplying the length by 4.

alice · March 22, 2021, 5:35pm

The multiply by 4 is ok because the types are u32 and u8 respectively. (and u8 has a smaller alignment than u32)

gretchenfrage · March 22, 2021, 5:36pm

Oh, that makes sense.

cole-miller · March 22, 2021, 6:04pm

The bytes crate gives you a type, Bytes, that's in essence a multiply-owned, thread-safe, immutable byte slice without extra indirection and with a safe interface; you could make your type a wrapper around Bytes and expose accessors that let you read/write u32s from/to it.

Edit: Actually, if you're using Vec<u32> to make sure the pointer is 4-byte aligned, I'm not certain that can be achieved with Bytes.

drewkett · March 22, 2021, 6:39pm

Is it the guarantee that Vec won't reallocate unless pushing to it when len == capacity that causes it to be safe to keep around a pointer to the data as long as the original Vec is alive (and as long as a mutable reference isn't taken)?

alice · March 22, 2021, 6:42pm

Yeah, vectors are guaranteed to only reallocate if you call a method that would increase its length past its capacity (or the reserve methods).

2e71828 · March 22, 2021, 6:48pm

If the data is immutable, you could remove the extra level of indirection by using Arc<[u32]> instead:

Arc::from(a.into_boxed_slice())

cole-miller · March 22, 2021, 6:55pm

A potential problem with this is that it copies the data (see here), which is required because the backing allocation for Arc<T> has a different layout from that of T. Unfortunately (afaict) there's no simple way to get some data into an Arc<[u8]> or similar without copying it in the process.

2e71828 · March 22, 2021, 7:12pm

It would be nice if Vec and Arc could at least put their payloads at the same offset from a memory page boundary: That copy would then be reduced to a simple page table update.

system · June 21, 2021, 12:18am

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.

Topic		Replies	Views
Help needed: self referential struct help	7	903	January 28, 2020
Self-referential struct - request for a review help	4	404	October 23, 2021
Self referential struct with references help	1	179	January 25, 2024
Some confusing about Pin & Box help	6	878	July 3, 2022
Self-referential struct with help of Pin? help	6	561	February 8, 2022

Is this self-referential struct unsound?

Related Topics