Implementing a potentially foreign allocated, potentially-mutable memory region

Hi,

I am trying to encapsulate some difficult-for-compiler-to-prove invariants in a safe struct.

We have data that may either be allocated by Vec or by an external allocator (e.g. ffi). We want to retain the ability to mutate it when allocated by Vec. I.e.

  • can be initialized from either a Vec or an immutable ptr,len,shared owner
  • is as fast to deref as Vec
  • is optionally mutable (when initialized from Vec)
  • it is sound and does not leak

This is what I have atm:

use std::mem::ManuallyDrop;
use std::ops::Deref;
use std::panic::RefUnwindSafe;
use std::{ptr::NonNull, sync::Arc};

/// Mode of deallocating memory regions
enum Allocation {
    /// Native allocation
    Native,
    // A foreign allocator and its ref count
    Foreign(Arc<dyn RefUnwindSafe + Send + Sync>),
}

/// A continuous memory region that may be allocated externally.
///
/// In the most common case, this is created from [`Vec`].
/// However, this region can also be allocated by a foreign allocator.
pub struct Bytes<T> {
    /// An implementation using an `enum` of a `Vec` or a foreign pointer is not used
    /// because `deref` is at least 50% more expensive than the deref of a `Vec`.
    data: ManuallyDrop<Vec<T>>,
    /// the region was allocated
    allocation: Allocation,
}

impl<T> Bytes<T> {
    /// Takes ownership of an allocated memory region `[ptr, ptr+len[`,
    /// # Safety
    /// This function is safe iff:
    /// * the region is properly allocated in that a slice can be safely built from it.
    /// * the region is immutable.
    /// # Implementation
    /// This function leaks iff `owner` does not deallocate the region when dropped.
    #[inline]
    pub unsafe fn from_owned(
        ptr: std::ptr::NonNull<T>,
        len: usize,
        owner: Arc<dyn RefUnwindSafe + Send + Sync>,
    ) -> Self {
        // This line is technically outside the assumptions of `Vec::from_raw_parts`, since
        // `ptr` was not allocated by `Vec`. However, one of the invariants of this struct
        // is that we do not expose this region as a `Vec`; we only use `Vec` on it to provide
        // immutable access to the region (via `Vec::deref` to `&[T]`).
        // MIRI does not complain, which seems to agree with the line of thought.
        let data = Vec::from_raw_parts(ptr.as_ptr(), len, len);
        let data = ManuallyDrop::new(data);

        Self {
            data,
            allocation: Allocation::Foreign(owner),
        }
    }

    /// The length of the region
    #[inline]
    pub fn len(&self) -> usize {
        self.data.len()
    }

    /// The pointer to the region
    #[inline]
    pub fn ptr(&self) -> NonNull<T> {
        unsafe { NonNull::new_unchecked(self.data.as_ptr() as *mut T) }
    }

    /// Returns a `Some` mutable reference of [`Vec<T>`] iff this was initialized
    /// from a [`Vec<T>`] and `None` otherwise.
    #[inline]
    pub fn get_vec(&mut self) -> Option<&mut Vec<T>> {
        match &self.allocation {
            Allocation::Foreign(_) => None,
            Allocation::Native => Some(&mut self.data),
        }
    }
}

impl<T> Drop for Bytes<T> {
    #[inline]
    fn drop(&mut self) {
        match self.allocation {
            Allocation::Foreign(_) => {
                // The ref count of the foreign is reduced by one
                // we can't deallocate `Vec` here since the region was allocated by
                // a foreign allocator
            }
            Allocation::Native => {
                let data = std::mem::take(&mut self.data);
                let _ = ManuallyDrop::into_inner(data);
            }
        }
    }
}

impl<T> std::ops::Deref for Bytes<T> {
    type Target = [T];

    #[inline]
    fn deref(&self) -> &[T] {
        &self.data
    }
}

impl<T: PartialEq> PartialEq for Bytes<T> {
    #[inline]
    fn eq(&self, other: &Bytes<T>) -> bool {
        self.deref() == other.deref()
    }
}

impl<T> From<Vec<T>> for Bytes<T> {
    #[inline]
    fn from(data: Vec<T>) -> Self {
        Self {
            data: ManuallyDrop::new(data),
            allocation: Allocation::Native,
        }
    }
}

In our case, we have an extra invariant that T: bytesmuck::Pod. I think that this is not needed here, but saying that we can significantly restrict T.

Any thoughts about this approach and its consequences?

You can't use Vec with data allocated outside of Rust's allocator, because Vec may want to reallocate/extend backing storage in lots of cases.

I don't see why you'd try to keep Vec as the owner. You can get its raw backing storage, and interact with Rust's allocator and other allocators directly. Basically, reimplement Vec from scratch by holding a raw pointer and some allocator reference (maybe even reuse GlobalAlloc trait?). With a deref to slice you don't need to reimplement much.

slice_from_raw_parts_mut is fine to use with a pointer from any allocator, but Vec::from_raw_parts is strictly Rust-allocator-only.

3 Likes

Thanks!

The struct does not allow reallocate or deallocate since the vec is only presented as Deref. Regardless, if I understand your point is that instead of using Vec, just keep Vec's parts and reconstruct the vec if we need a vec, therefore avoiding that risk entirely.

This argument:

Is not correct. Miri does not complain because you don't have UB right now, but you may definitely have in the future. Theoretically, if you violate the guarantees of from_raw_parts(), even len() could have UB. Even from_raw_parts() itself. It is very unlikely, but relying on that is wrong, especially when as @kornel said you can do just as easy without Vec.

I'm not entirely sure this code is unsound, because I'm not sure it violates from_raw_parts()'s safety requirements. The docs are a bit vague. I think from_raw_parts() is UB if the Vec can't grow/shrink/dealloc the given memory using the given allocator (the default allocator, in this case), but the docs only say that (emphasis mine):

  • ptr needs to have been previously allocated via String/Vec<T> (at least, it’s highly likely to be incorrect if it wasn’t).

and:

The ownership of ptr is effectively transferred to the Vec which may then deallocate, reallocate or change the contents of memory pointed to by the pointer at will.

2 Likes

Thanks a lot for the insight. I understand what you are saying.

I did audited Vecs implementation prior to writing that comment. The only function we use with a foreign pointer is deref and I doubt Vec will ever reallocate or deallocate on Deref since that would be a major backward incompatible change on std.

With that said, I did try to fix this with the following:

use std::ops::Deref;
use std::panic::RefUnwindSafe;
use std::{ptr::NonNull, sync::Arc};

/// Mode of deallocating memory regions
enum Allocation {
    /// Native allocation
    Native(usize),
    // A foreign allocator and its ref count
    Foreign(Arc<dyn RefUnwindSafe + Send + Sync>),
}

// this is still unstable, so copying it here
fn into_raw_parts<T>(this: Vec<T>) -> (*mut T, usize, usize) {
    let mut me = std::mem::ManuallyDrop::new(this);
    (me.as_mut_ptr(), me.len(), me.capacity())
}

/// A continuous memory region that may be allocated externally.
///
/// In the most common case, this is created from [`Vec`].
/// However, this region can also be allocated by a foreign allocator.
pub struct Bytes<T> {
    /// An implementation using an `enum` of a `Vec` or a foreign pointer is not used
    /// because `deref` is at least 50% more expensive than the deref of a `Vec`.
    ptr: *mut T,
    len: usize,
    /// the region was allocated
    allocation: Allocation,
}

impl<T> Bytes<T> {
    /// Takes ownership of an allocated memory region `[ptr, ptr+len[`,
    /// # Safety
    /// This function is safe iff:
    /// * the region is properly allocated in that a slice can be safely built from it.
    /// * the region is immutable.
    /// # Implementation
    /// This function leaks iff `owner` does not deallocate the region when dropped.
    #[inline]
    pub unsafe fn from_owned(
        ptr: std::ptr::NonNull<T>,
        len: usize,
        owner: Arc<dyn RefUnwindSafe + Send + Sync>,
    ) -> Self {
        Self {
            ptr: ptr.as_ptr(),
            len,
            allocation: Allocation::Foreign(owner),
        }
    }

    /// The length of the region
    #[inline]
    pub fn len(&self) -> usize {
        self.len
    }

    /// The pointer to the region
    #[inline]
    pub fn ptr(&self) -> NonNull<T> {
        unsafe { NonNull::new_unchecked(self.ptr) }
    }

    /// Returns a `Some` mutable reference of [`Vec<T>`] iff this was initialized
    /// from a [`Vec<T>`] and `None` otherwise.
    #[inline]
    pub fn get_vec(&mut self) -> Option<&mut Vec<T>> {
        match self.allocation {
            Allocation::Foreign(_) => None,
            Allocation::Native(capacity) => {
                Some(unsafe { &mut Vec::from_raw_parts(self.ptr, self.len, capacity) })
            }
        }
    }
}

impl<T> Drop for Bytes<T> {
    #[inline]
    fn drop(&mut self) {
        match self.allocation {
            Allocation::Foreign(_) => {
                // The ref count of the foreign is reduced by one
                // we can't deallocate `Vec` here since the region was allocated by
                // a foreign allocator
            }
            Allocation::Native(capacity) => {
                unsafe { Vec::from_raw_parts(self.ptr, self.len, capacity) };
            }
        }
    }
}

impl<T> std::ops::Deref for Bytes<T> {
    type Target = [T];

    #[inline]
    fn deref(&self) -> &[T] {
        // core invariant of this struct
        unsafe { std::slice::from_raw_parts(self.ptr, self.len) }
    }
}

impl<T: PartialEq> PartialEq for Bytes<T> {
    #[inline]
    fn eq(&self, other: &Bytes<T>) -> bool {
        self.deref() == other.deref()
    }
}

impl<T> From<Vec<T>> for Bytes<T> {
    #[inline]
    fn from(data: Vec<T>) -> Self {
        let (ptr, len, capacity) = into_raw_parts(data);
        Self {
            ptr,
            len,
            allocation: Allocation::Native(capacity),
        }
    }
}

unsafe impl<T: Send> Send for Bytes<T> {}
unsafe impl<T: Sync> Sync for Bytes<T> {}

However, I could not make this work due to

/// Returns a `Some` mutable reference of [`Vec<T>`] iff this was initialized
    /// from a [`Vec<T>`] and `None` otherwise.
    #[inline]
    pub fn get_vec(&mut self) -> Option<&mut Vec<T>> {
        match self.allocation {
            Allocation::Foreign(_) => None,
            Allocation::Native(capacity) => {
                Some(unsafe { &mut Vec::from_raw_parts(self.ptr, self.len, capacity) })
            }
        }
    }

Vec is now a temporary. This is a bit limiting since it does not allow us to move the Vector out of Bytes - we can only expose &mut[T] in this representation.

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.