Storing A Smaller Length Of A Slice And Accessing Beyond The Stored Length (But Inside The Actual Length)

Is this code UB?

let x = [0,1,2,3];

let y = &x[0..2];

unsafe { 
   let z = *y.as_ptr().add(3);
}

Yes of course. In C-ese it's outside the bounds of the object. As to

why is this UB?

It's UB purely to allow for optimizations. I doubt it will actually be exploited in this particular example, but it should be exploited in non-inlined code that does some bounds checking first (that is, if we hide x from the compiler). It might not be exploited in those examples either, but llvm is allowed to change that at any time.

There's some relevant discussion in this issue on the Unsafe Code Guidelines repo:

https://github.com/rust-lang/unsafe-code-guidelines/issues/134

If the slice is created with something like [T].split_at_mut() this pattern will lead to mutable aliases.

2 Likes

What if I never call split_at_mut?

I think this discussion is somewhat different, since I am not accessing the array from the first element, I am accessing the array from .as_ptr().

This is not outside of bounds, since the buffer has size 4.

This UB by the same principle as described here. If you create a raw pointer from a reference, that raw pointer may only access what the reference could have accessed. The reference &x[0..2] can't access the fourth element, so it is UB for the raw pointer you made from it to do so.

1 Like

As others have already stated, this is UB. If it wasn't, I imagine it'd be easy for someone to introduce unsoundness by using a pointer derived from a slice to access memory outside of that slice.

This feels quite similar to the issue in C where it is undefined behaviour to access memory outside an allocation using pointer arithmetic, even if that memory is actually allocated and perfectly usable... The problem comes because the compiler assumes that each allocation is unique and adjusts its pointer aliasing assumptions correspondingly. If a rogue pointer were to access memory other than the region expected by the compiler, you could introduce memory issues because the compiler made code transformations and optimisations assuming they'd be impossible for other code to observe.

2 Likes

So does this also mean that flexible structures aren't supported for FFI? That is, where the last type in a structure is an array with length 1, but the real length is stored elsewhere. For example:

#[repr(C)]
struct Foo {
    length: usize,
    data: [u8; 1],
}
impl Foo {
    fn data(&self) -> &[u8] {
        unsafe {
            slice::from_raw_parts(self.data.as_ptr(), self.length)
        }
    }
}

That example can only be made safely with raw pointers. The compiler knows that Foo::data type is [u8; 1], and it will generate code under this specific assumption. Creating slices outside of this range will lead to UB.

3 Likes

They are, but you must be very careful with the Rust references used while handling these data structures:

#[repr(C)]
pub
struct Foo<Data> {
    length: usize,
    data: Data,
}

impl<const N: usize> Foo<[u8; N]> {
    pub
    fn new (data: [u8; N])
      -> Self
    {
        Self { length: N, data }
    }

    pub
    fn unsize (self: &'_ Foo<[u8; N]>)
      -> MyRef<'_, Foo<Erased>>
    {
        assert_eq!(self.length, N);
        unsafe {
            MyRef::from_ptr(ptr::NonNull::from(self).cast())
        }
    }
}
impl Foo<Erased> {
    pub
    fn data<'__> (self: MyRef<'__, Self>)
      -> &'__ [u8]
    {
        unsafe {
            let length = self.length;
            let at_data: *const u8 =
                self.ptr().as_ptr()
                    .cast::<u8>()
                    .add(mem::size_of::<usize>())
                    // plus pad to alignment (nothing here because u8)
            ;
            slice::from_raw_parts(at_data, length)
        }
    }
}
  • Playground (I have used a few nightly features for the sake of ergonomics, but nothing paramount)

  • the key is that MyRef<'_, T> has different semantics than &'_ T, since it is made of a pointer that can hold more "provenance" than just the T (in this case the pointer originates from unsize)

3 Likes

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.