Rust wrapper around C flexible array

First of all, your code has triggered Miri, a Rust interpreter designed to detect as much UB as possible:

  • You can run Miri, in the Playground, by having an fn main() and using the Tools on the top-right menu.

I will thus go over fixing your code, but keep in mind that flexible array buffers don't play well with Rust semantics, and this kind of "extra indirection" (which in practice can be softened by marking the functions #[inline], or by defining your own types for the shared ref and exclusive ref types (the former being Copy, the latter offering a .reborrow() method; the ergonomics are then not great)) is thus often unavoidable.

Buffer is quite straight-forward to define:

#[repr(C)]
struct Buffer {
    metadata: u32,
    data: *mut c_void,
}

BufferList is indeed quite trickier. You cannot use a fixed length (one that may be too short or too long) while also using "high-level Rust pointers" (&, &mut, Box, etc.) to such Buffer.

From having had to deal with those in the pasts, there are two approaches:

  • define BufferList as a DST (dynamically sized-type):

    #[repr(C)] // guarantees ordering, but does not really have a meaning in C
    struct BufferList {
        num_buffers: u32,
        buffers: [Buffer],
    }
    

    the complication then comes from creating a high-level Rust pointer to that, which will be a fat pointer, i.e., a 2-pointer-wide pair / struct with the address of the first elem (if any), and the elem count (yes, that will be redundant, but that's the price to pay, currently, with high-level Rust pointers to DSTs).

  • Never expose BufferList itself to Rust code as an arbitrary pointee, but rather create your own newtype representing a slim implementation of a type with Box<BufferList> semantics. This is what you have done, and good job with going in that direction, not everybody does it (they go with the first approach with a fake fixed length, which is UB). Sadly you got the implementation wrong, but we'll now see how to handle that part.

The "Box<BufferList>" newtype w/ hand-rolled implementation

In your implementation, the pointee is now called BufferListInternal, and that newtype wrapper over an owning pointer is BufferList.

The offset_of_buffers is a bit of a shortcut that doesn't generalize very well; you are relying on size_of::<u32>() (metadata field) being smaller or equal than the alignment of BufferListInternal (otherwise that computation of the offset is wrong). This is true in your case, but it'd warrant some comment.

This is the part where all hell breaks loose :sweat_smile::

  • you don't type-annotate the construction of the Vec, so subtle inference changes could change the semantics in very wrong ways. Here, for instance, the fact that this compiles, means that inference did lead to a single candidate type, which given the pointer cast, must have been a BufferListInternal. The MIR output (available in the top-left menu of the Playground) confirms this:

    This means that we are heap-allocating an uninitialized buffer of size(in bytes) BufferListInternals elements, thus having a size, in bytes, of size * size_of::<BufferListInternal>(). "Luckily", this just leads to over-allocating: since the latter is bigger than what you intended, you won't happen to read nor write past the allocation.

  • The second problem comes from destructors and temporaries: by doing Vec::new() / Vec::with_capacity() followed by an .as_mut_ptr() (which borrows the Vec, and yields a lifetime-disconnected (still borrowing) pointer) Rust:

    1. Vec::with_capacity() → It allocates your buffer;

      Since it is not bound to any local variable / name, Rust creates its own temporary that will be dropped at the end of the enclosing statement. In the following diagram, that local is called _15.

    2. Vec::as_mut_ptr() → Rust exclusively borrows the vec ( &mut Vec) to yield a pointer to the buffer owned by the Vec. Same as before, it uses its own temporary to hold the obtained pointer (called _13).

    3. It then writes that obtained pointer in the .inner field (.0) of the to-be-returned value (_0, of type BufferList).

    4. This marks the end of the aforementioned enclosing statement, at which point Rust cleans up its temporaries, leading to the Vec temporary (_15) being drop-ped! This deallocates the just allocated buffer, which means that the .inner pointer dangles as soon as it is created! If this had been a regular borrow, Rust would have errored about a "local does not live long enough" error, but given the lifetime-erased nature of the obtained pointer, Rust no longer knows that such pointer is borrowing from a deallocated Vec, and thus says nothing. This is the same problem as in CString/.as_ptr() is incredibly unsafe :(.

    Hence the UB detected by MIRI.

The "cleanest" (imho) way to write this pattern

The correct way of writing that function is by relying heavily on Layout and its convenience methods, as well as the following very handy type definition:

#[repr(C)] // required for the alloc layout shenanigans!
struct BufferList<
    Buffers // make the Rust type generic over its trailing field
    : ?Sized  // with support for DSTs (here, mainly, `[Buffer]`)
    = [Buffer] // add a convenient default parameter
>
{
    num_buffers: u32,
    buffers: Buffers,
}

This way BufferList, with no params provided, represents the type with a [Buffer] field I was talking about at the beginning of my post.

You can already see how convenient this def is, when you see you can write:

let boxed_buffer_list: Box<BufferList> = Box::new(BufferList {
    num_buffers: 2,
    buffers: [ Buffer { … }, Buffer { … }, ], // : [Buffer; 2]
})/* coercion to a fat pointer */;

to which we can already add:

impl<Buffers : ?Sized> BufferList<Buffers> {
    pub
    fn as_ptr (self: &'_ Self)
      -> *const BufferList<[Buffer; 1]>
    {
        self
            as *const Self // remove the lifetime and the "higher-level-ness" (still a fat pointer)
            as *const _ // slim pointer (trash away the len part of the fat pointer)
    }

    pub
    fn as_mut_ptr (self: &'_ mut Self)
      -> *mut BufferList<[Buffer; 1]>
    {
        self as *mut Self as *mut _
    }
}

This will enable you to use this type across FFI.

With the very near-future min_const_generics, we will be able to add a stack / inline constructor:

impl<const N: usize> BufferList<[Buffer; N]> {
    pub
    fn new (buffers: [Buffer; N])
      -> Self
    {
        Self {
            num_buffers: N.try_into().expect("Count overflowed `u32`"),
            buffers,
       }
    }

    // Heap-allocate and then fatten the obtained pointer
    pub
    fn boxed (self: Self)
      -> Box<BufferList /* <[Buffer]> */>
    {
        Box::new(self)
    }
}

And now, for the runtime length initialization using heap allocation, as you attempted to do:

extern crate alloc as alloc_crate;

use ::core::{
    convert::{TryFrom, TryInto},
    ptr,
};
use ::alloc_crate::alloc::{self, Layout};

impl BufferList {
    pub
    fn new (iterator: impl ExactSizeIterator<Item = Buffer>)
      -> Box<Self>
    { unsafe {
        let count = iterator.len();
        let count_u32 = u32::try_from(count).expect("Count overflowed `u32`");
        // layout for the `num_buffers` field + padding for the `buffers`.
        let num_buffers_layout = Layout::new::<BufferList<[Buffer; 0]>>();
        let buffers_layout = Layout::array::<Buffer>(count).unwrap();
        let (self_layout, buffers_offset) =
            num_buffers_layout.extend(buffers_layout).unwrap()
        ;
        let self_layout = self_layout.pad_to_align(); // in case `num_buffers_layout` had a higher alignment than that of `Buffer`.
        let ptr = alloc::alloc(self_layout).cast::<u8>();
        if ptr.is_null() {
            enum Diverged {}
            let _: Diverged = alloc::handle_alloc_error(self_layout);
        }
        let (ptr_to_num_buffers, mut ptr_to_buffers) = (
            ptr.cast::<u32>(),
            ptr.add(buffers_offset).cast::<Buffer>(),
        );
        ptr_to_num_buffers.write(count_u32);
        // We need to use `zip` as extra safety since
        // `ExactSizeIterator` is allowed to lie (the trait is not `unsafe`).
        let ref mut idxs = 0 .. count;
        idxs.zip(iterator).for_each(|(_, buffer)| {
            // zip prevents the iterator from yielding too many elems
            ptr_to_buffers.write(buffer);
            ptr_to_buffers = ptr_to_buffers.add(1);
        });
        // the iterator may also have yielded not enough elems
        assert!(idxs.next().is_none(), "ExactSizeIterator lied");
        // at this point, we know we have correctly
        // initialized and written everything 🙌
        // Time to fatten our pointer… (redundantly embed the length)
        let fake_fat_ptr: *mut [BufferList<Buffer>] = ptr::slice_from_raw_parts_mut(ptr.cast(), count);
        let actual_fat_ptr: *mut BufferList<[Buffer]> = fake_fat_ptr as _;
        Box::<Self>::from_raw(actual_fat_ptr)
    }}
}
15 Likes