How to work with `!Sized` types in Rust

4 Likes

Below are the random things I thought of as I read through the article.

Sizedness in Rust is a particular topic.

Everything topic is a particular topic. Maybe you meant... peculiar? interesting? niche?

There's some other article/plurality nits which aren't a big deal; I won't point out.

Now this is very practical. This can be used everywhere to store dynamic data. The data can then easily be

(Incomplete sentence.)

VecView is the size of three pointers: one for the length, one for the start of the slice, and one for its length. Since the offset between the length and the beginning of the slice is known at compile-time, one pointer could be done without.

Call the buffer length "capacity" to avoid confusion. It took a couple re-reads of this paragraph to realize you meant that the len and buffer fields point to memory a fixed offset apart.

There are really no way to create a fat pointer to a user-defined structure, they can only be created by the compiler itself.

I don't think that's true, but I'll keep reading.

...(done reading). Ok I'll make a follow up comment in a bit.

I had the brilliant idea of actually reading documentation on how unsized types can be constructed.

Also mentioned here.

1 Like

Here’s another idea for another approach, utilizing the more niche corners of what Rust’s Unsize can support.

use core::mem::MaybeUninit;

pub trait ArraySize: sealed::Sealed {
    type ArrayType<T>: ?Sized;
}

pub struct Const<const N: usize>;
pub struct Slice; // <- not sure about the best name here

impl<const N: usize> ArraySize for Const<N> {
    type ArrayType<T> = [T; N];
}
impl ArraySize for Slice {
    type ArrayType<T> = [T];
}

mod sealed {
    pub trait Sealed {}
    impl<const N: usize> Sealed for super::Const<N> {}
    impl Sealed for super::Slice {}
}

pub struct Vec_<T, N: ArraySize> {
    len: usize,
    buffer: N::ArrayType<MaybeUninit<T>>,
}

pub type Vec<T, const N: usize> = Vec_<T, Const<N>>;
pub type VecView<T> = Vec_<T, Slice>;

impl<T, const N: usize> Vec<T, N> {
    pub fn as_mut_view(&mut self) -> &mut VecView<T> {
        self
    }
}

impl<T> VecView<T> {
    pub fn push(&mut self, value: T) -> Result<(), T> {
        // If the len is already N, it is not possible to store any more items.
        if self.len == self.buffer.len() {
            return Err(value);
        }

        unsafe { *self.buffer.get_unchecked_mut(self.len) = MaybeUninit::new(value) };
        self.len += 1;
        Ok(())
    }
}

I suppose, this can be a bit easier to work with, e.g.

mod class {
    pub struct Class;
}
struct Instruction;

// we can just use the `ArraySize` parameter if we (the user) don’t want
// the extra steps of sealing again, and defining aliases again,
// and making sure everything’s properly documented, …
pub struct Command<N: ArraySize> {
    class: class::Class,
    instruction: Instruction,
    p1: u8,
    p2: u8,
    le: usize,
    extended: bool,
    data: Vec_<u8, N>,
}

fn test_coercion<const N: usize>(x: &mut Command<Const<N>>) -> &mut Command<Slice> {
    x
}

The downside (well one downside I can come up with) is that going through a trait for ArrayType calls covariance, so e.g. this breaks

fn test_variance<'a: 'b, 'b>(x: Vec<&'a (), 42>) -> Vec<&'b (), 42> {
    x
}

(this is likely a deal-breaker for following this approach in a situation where the API is already stable covariantly)


One nice consequence of this here is that we can have generic API, like e.g. making push work with one implementation for both Vec and VecView (and usable generically, too) … well we need some trait bound then such as

pub trait ArraySize: sealed::Sealed {
    type ArrayType<T>: ?Sized + AsMut<Target = [T]>;
}

and with a little modification this can be even nicer to use

pub trait ArraySize: sealed::Sealed {
    type ArrayType<T>: ?Sized + std::ops::DerefMut<Target = [T]>;
}

which needs a little adjusting of the contained types to get that DerefMut going…

pub struct DerefWrapArray<A: ?Sized>(pub A);
impl<T, const N: usize> std::ops::Deref for DerefWrapArray<[T; N]> {
    type Target = [T];
    fn deref(&self) -> &Self::Target {
        &self.0
    }
}
impl<T, const N: usize> std::ops::DerefMut for DerefWrapArray<[T; N]> {
    fn deref_mut(&mut self) -> &mut Self::Target {
        &mut self.0
    }
}
impl<T> std::ops::Deref for DerefWrapArray<[T]> {
    type Target = [T];
    fn deref(&self) -> &Self::Target {
        &self.0
    }
}
impl<T> std::ops::DerefMut for DerefWrapArray<[T]> {
    fn deref_mut(&mut self) -> &mut Self::Target {
        &mut self.0
    }
}

impl<const N: usize> ArraySize for Const<N> {
    type ArrayType<T> = DerefWrapArray<[T; N]>;
}
impl ArraySize for Slice {
    type ArrayType<T> = DerefWrapArray<[T]>;
}

Now, as soon as that’s done, your push method generalizes immediately and without any change to the code:

impl<T, N: ArraySize> Vec_<T, N> {
    pub fn push(&mut self, value: T) -> Result<(), T> {
        // If the len is already N, it is not possible to store any more items.
        if self.len == self.buffer.len() {
            return Err(value);
        }

        unsafe { *self.buffer.get_unchecked_mut(self.len) = MaybeUninit::new(value) };
        self.len += 1;
        Ok(())
    }
}

Rust Playground

2 Likes

What I was talking about with creating wide pointers

There are really no way to create a fat pointer to a user-defined structure, they can only be created by the compiler itself.

There's a way to create a wide pointer to a user-defined structure that is robust in the face of unspecified layouts: casting pointers with the same metadata.

Type of e U Cast performed by e as U
*T *V where V: Sized :star: Pointer to pointer cast

:star: or T and V are compatible unsized types, e.g., both slices, both the same trait object.

And in particular, for slices and DSTs using slices:

For slice types like [T] and [U], the raw pointer types *const [T], *mut [T], *const [U], and *mut [U] encode the number of elements in this slice. Casts between these raw pointer types preserve the number of elements. Note that, as a consequence, such casts do not necessarily preserve the size of the pointer's referent (e.g., casting *const [u16] to *const [u8] will result in a raw pointer which refers to an object of half the size of the original). The same holds for str and any compound type whose unsized tail is a slice type, such as struct Foo(i32, [u8]) or (u64, Foo).

See also.

Additionally, for slices, core gives us ptr::slice_from_raw_parts_mut, so we can create slice pointers in particular without having to maintain the invariants of references, on stable today.

How to make use of it

We want to first construct a *mut [MaybeUninit<T>] with the proper metadata (buffer capacity), but with the data address portion of the wide pointer pointing at the containing struct. Then you can perform a pointer cast to get a *mut VecView<T> with the appropriate metadata and data address.

Your Vec and VecView[1] have the same layout modulo the latter being unsized (alignment and field offsets). So the data address of the desired slice wide pointer can be the same as the original (data) pointer address (*mut Vec<T, N>).

slice_from_raw_parts_mut takes a *mut T to construct a *mut [T], and the metadata (buffer capacity) is N, so that would be:

        let data = self as *mut Self as *mut MaybeUninit<T>;
        let vv: *mut [MaybeUninit<T>] = slice_from_raw_parts_mut(data, N);

And then the pointer cast (and then borrowing as &mut):

        unsafe { &mut *(vv as *mut VecView<T>) }

And... that's it really.

[2]

However

You do miss out on the coercion goodness, so I'm guessing this is just a factual correction to your article and not a direction you want to pursue further.


  1. at this point in the presentation ↩︎

  2. The example in main illustrates that you need to keep VecView behind a privacy barrier or add unsafe to the constructor or something, since I could have just changed the length to 5 without writing any data. ↩︎

7 Likes

I didn't know that was specified and could work!

I updated the article with all your feedback. Thanks a lot !

2 Likes

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.