Unsized generic type with no bit validity requirements?

I'm trying to make a type that satisfies the following properties:

/// A type with the same size as `T` and with no alignment or bit validity
/// requirements (any sequence of the appropriate number of bytes is a
/// valid instance).
struct MaybeValid<T: ?Sized> { ... }

If we require T: Sized, this is easy to accomplish like so:

#[repr(C, packed)]
struct MaybeValid<T>{
    // INVARIANT: All of the bytes of this field are initialized.
    inner: MaybeUninit<T>,
}

However, if we try this for T: ?Sized, it doesn't work because MaybeUninit requires T: Sized. And we can't just implement our own MaybeUninit without this requirement, as unions don't currently support unsized fields.

Can anyone think of a way to accomplish this? I can't think of a way to store a T and do away with bit validity requirements other than by wrapping in MaybeUninit, and I can't think of a way to ensure that the type has the same size as T without including one in the struct.

cc @kupiakos

If it's unsized, i.e. doesn't have a size known at compile time, there is no way to ensure this at compile time by definition. Different instances could have different sizes. You can ensure it at run time if you have a way to get the size from the instance. In Rust this is typically done by way of a wide pointer.

You could perhaps have a

#[repr(transparent)]
struct MaybeValid<T> {
    ty: PhantomData<T>,
    data: [u8], 
    // or [MaybeUninit<u8>] but you said all bytes were initialized
}

and deal with (wide) pointers to this type. However, custom DSTs[1] are still a half-baked feature in Rust and typically need unsafe and/or nightly to work with effectively.


  1. dynamically sized types, unsized types â†Šī¸Ž

This can be done with a helper trait that MaybeValid bounds on. Most users will not notice the trait bound, since there are blanket impls on both T: Sized and [T].

Hopefully, this sort of workaround will become unnecessary with support for unsized unions and a much more ergonomic MaybeUninit<T: ?Sized>.

// Really though, this should _only_ be about bit validity and not mess with making it packed,
// use a different newtype for alignment.
#[repr(C, packed)]
pub struct MaybeValid<T: ?Sized + AsUninit>(T::Uninit);

/// Converts a reference to its uninitialized form.
///
/// # Safety requirement (for implementers)
/// - All methods must return the same address as their input.
pub unsafe trait AsUninit {
    /// # Safety requirement (for implementers)
    /// 
    /// This type must have the same layout as `Self` but with
    /// no byte validity requirement on its contents.
    type Uninit: ?Sized;

    /// Converts a `&self` to its uninitialized equivalent.
    fn as_ref_uninit(&self) -> &Self::Uninit;

    /// Converts a `&mut T` to its uninitialized equivalent.
    ///
    /// This _upgrades_ the `&mut T` (read-write-valid-values-only) reference to a
    /// `&mut MaybeUninit<_>` (write-anything) reference.
    /// 
    /// # Safety (for callers)
    ///   - The obtained reference cannot be used to write uninitialized data
    ///     (`MaybeUninit::uninit()`) into the pointee.
    unsafe fn as_mut_uninit(&mut self) -> &mut Self::Uninit;

    /// Gets a raw pointer to the inner type.
    fn as_ptr(uninit: &Self::Uninit)-> *const Self;

    /// Gets a raw mutable pointer to the inner type.
    fn as_mut_ptr(uninit: &mut Self::Uninit) -> *mut Self;

    // `assume_init` should not be implemented because it is only valid
    // for `T: Sized`, and then you can just use `MaybeUninit::assume_init`.

    /// Converts a `&MaybeUninit<_>` to a `& _`.
    ///
    /// # Safety (for callers)
    ///   - The `Self::Uninit` pointee must be initialized.
    unsafe fn assume_init_ref(uninit: &Self::Uninit) -> &Self;

    /// Converts a `&mut MaybeUninit<_>` to a `&mut _`.
    ///
    /// # Safety (for callers)
    ///   - The `Self::Uninit` pointee must be initialized.
    unsafe fn assume_init_mut(uninit: &mut Self::Uninit) -> &mut Self;
}

unsafe impl<T> AsUninit for T {
    type Uninit = MaybeUninit<T>;

    fn as_ref_uninit(&self) -> &Self::Uninit {
        unsafe { &*(self as *const T).cast() }
    }

    unsafe fn as_mut_uninit(&mut self) -> &mut Self::Uninit {
        &mut *(self as *mut T).cast()
    }

    unsafe fn assume_init_ref (uninit: &MaybeUninit<T>)
      -> &Self {
        self.assume_init_ref()
    }

    unsafe fn assume_init_mut (uninit: &mut MaybeUninit<T>) -> &mut Self {
        self.assume_init_mut()
    }
}

unsafe impl<T> AsUninit for [T] {
    type Uninit = [MaybeUninit<T>];

    fn as_ref_uninit(&self) -> &[MaybeUninit<T>] {
        unsafe { &*(self as *const [T] as *const [MaybeUninit<T>]) }
    }

    unsafe fn as_mut_uninit(&mut self) -> &mut [MaybeUninit<T>] {
        &mut *(self as *mut [T] as *mut [MaybeUninit<T>])
    }

    unsafe fn assume_init_ref (uninit: &[MaybeUninit<T>]) -> &[T] {
            let len = uninit.len();
            slice::from_raw_parts(
                uninit.as_ptr().cast(),
                len,
            )
    }

    unsafe fn assume_init_mut (uninit: &mut [MaybeUninit<T>]) -> &mut [T] {
            let len = uninit.len();
            slice::from_raw_parts_mut(
                uninit.as_mut_ptr().cast(),
                len,
            )
        }
}

I think we might be talking past each other a bit. You can already do this today trivially so long as you're willing to accept that your type has bit validity constraints:

#[repr(transparent)]
struct MaybeValid<T: ?Sized>{
    inner: T,
}

My question is just about producing something that is identical to this type except that it has no bit validity constraints (ie, all byte sequences of valid lengths are valid instances). I grant that actually working with such a thing at runtime (e.g., producing a fat reference) would be difficult, but I'm leaving that for future work :slight_smile:

1 Like

Yeaaaah, I was afraid this was going to be the answer. (Warning: inside baseball for zerocopy.) For our use cases, I guess it's fine because we're only ever going to be using this with types that have opted into a derive anyway, so we can just emit impls like that. I was hoping to avoid more derive boilerplate, but it might be unavoidable.

The derive boilerplate is only needed for custom DSTs. It's very possible a #[derive(SliceDst)] will be necessary for all of the pieces to fit together.

Fingers crossed that we can avoid it, but maybe so.