DSTs on the stack with const max size

I made a little proof of concept for a type Size<T, SIZE> that stores an instance of the unsized type T on the stack as long as it takes up less than or exactly SIZE bytes. The main use case I could see for this is using it with dyn Fn as a substitute for boxed closures.

Example usage:

let a = 1;
let b = vec![1,2,3];
let c = (true, false);
// you can store them all under the same type!
let anys: Vec<Size<dyn Any, 24>> = vec![Size::new(a), Size::new(b), Size::new(c)];
let x = 1;
let f = move || x;
let y = 2;
let z = 3;
let g = move || y + z;
let functions: Vec<Size<dyn Fn() -> i32, 8>> = vec![Size::new(f), Size::new(g)];
for function in functions {
    // it implements Deref so you can actually use the trait
    println!("{}", function());
}

The way it works is by storing the value in a buffer of [u8; SIZE], but then also storing the metadata of the unsized type T (i.e. length for [T] and str and a vtable pointer for dyn Trait) which allows it too implement Deref and DerefMut, as well as Drop.

Does this sound useful to anyone else? Do you know if someone has already implemented something like this? I did have to use some unstable feature flags (ptr_metadata and Unsize) as part of my implementation so I don't really know how much of a good idea it is to actually use this.

Full implementation:

I think you need to somehow include alignment in the type as well.

3 Likes

Yes, I think it would be useful for avoiding boxing of elements in largish collections.

Your first example uses a Vec instead of a slice. Is is possible to use the slice and str types directly with this?

1 Like

There are some crates[1] that do things like this for embedded use cases and such.


  1. e.g. and I know there are others ↩︎

1 Like

This does work with any unsized type so you could do Size<[T], SIZE>, but that seems a bit redundant to me because you can already do [T; LENGTH]. Though Size<[T], SIZE> does allow to store an array up to a certain size while [T; LENGTH] requires the array have exactly LENGTH elements, so maybe that's something.

It's a similar story for strings - even though there is no str<LENGTH> type built in, it shouldn't be hard to create one because a string with length LENGTH is basically just stored in memory as [u8; LENGTH]. Again, maybe if you want to store a string up to a certain size it could prove useful? But I think making a specific datatype for strings with a constant length would be preferable.

The one problem with strings is that there is no type that implements Unsize<str> (i.e. there is no sized type T for which you can cast from &T to &str). As usual, the solution for this is in a feature flag, this one is called clone_to_uninit which provides the CloneToUninit trait that allows you to clone unsized types, here's my go at it:

As for using a vector, I used a vector just because it's an example of a datatype that requires that all of the elements in it to have the same type. I could have totally used [Size<dyn Fn() -> i32, 8>; 3] instead, but I feel like if you have a constant sized array the different types become less of an issue because you could just use a tuple of (typeof f, typeof g, typeof h) instead.

1 Like

I don't think there is a way to do this other than just making a whole bunch of copies of the same struct with different types for the buffer in order to force alignment:

pub struct SizeAlign1<T: ?Sized + Pointee, const SIZE: usize> {
    metadata: T::Metadata,
    buffer: [u8; SIZE],
}
pub struct SizeAlign2<T: ?Sized + Pointee, const SIZE: usize> {
    metadata: T::Metadata,
    buffer: [u16; SIZE],
}
pub struct SizeAlign4<T: ?Sized + Pointee, const SIZE: usize> {
    metadata: T::Metadata,
    buffer: [u32; SIZE],
}
.
.
.

Edit: After looking through the source code of sized_dst I found this hack:

pub struct Size<T: ?Sized + Pointee, Align = u8, const SIZE: usize> {
    metadata: T::Metadata,
    buffer: [u8; SIZE],
    _align: [Align; 0]
}

// align of 2
let f = Size::<dyn Fn(), u16, 8>::new(|| ());

let data_for_g: (u32, i32) = (12, -13);
// same align as data_for_g, which is 4
let g = Size::<dyn Fn(), (u32, i32), 8>::new(|| ());

That looks really interesting. I looked a bit through their source code and what they're basically doing is storing a vtable of an implementation of Futute<Output=T> on the stack, which is better then what I'm doing because Size<dyn Future<Output=T>, SIZE> would store a pointer to a vtable instead, which I believe is less performant because you have to traverse two pointers instead of one. Sadly rust doesn't allow you to have structs that are generic over traits so if you wanted to store the vtable for another trait on the stack you would have to implement that for that trait specifically - maybe it's possible to make a macro for that? Overall I still think there's value in a generic implementation of this.

You mentioned there are other crates that do this, and the crate you talked about gave me the search terms to find those:

  • stack_dst - which allows you to allocate dsts to pretty much any buffer you want - weather on the stack or not - but that comes with the trade off that the size of the buffer is not known at compile time so you have to do runtime checks to make sure the value doesn't overflow. It also implements a FIFO and a stack that are stored in a buffer and can contain dsts.
  • sized_dst - which is implemented almost identically to my implementation, except it also supports custom alignment.
1 Like

Please note that you cannot store arbitrary types in [u8; SIZE], as that type is not valid for uninitialized bytes, and you also cannot store provenance in there, meaning that references that are stored in this type cannot be turned back into the original type (and pointers become invalid).

Your original playground has UB as soon as MaybeUninit::uninit().assume_init(), because you cannot construct [u8; SIZE] that way (a simple run with miri detects this).

You need to use [MaybeUninit<u8>; SIZE]. You can even construct that type safely with [const { MaybeUninit::uninit() }; SIZE] (though uninit().assume_init() is also explicitly allowed for this case).

2 Likes

You're totally right but, regarding provenance, I don't think there is much I could do here. There isn't really a point in putting a reference inside a Size<T, SIZE> anyway because you could just do &T, so I don't think it matters that much. In case you were talking about the provenance of the references returned by deref and deref_mut, the docs say that pointer casting preserves provenance so maybe here that applies too? I honestly don't know.

The problem is that pointer -> integer (array) -> pointer is not a lossless conversion.
This also applies to pointers in fields, Boxes, Vecs, etc. If you do not have a way to deal with this, you do not have a way to write a sound library (one where UB cannot be reached from safe code) that leverages this.

Luckily, you actually can do something about it. The MaybeUninit<u8> array is actually guaranteed to preserve the provenance of all its representation bytes.

This code can be implemented completely soundly as a library. (edit: forgot about cells, see below)

I might be missing something here but how can this cause UB? Doesn't loss of provenance just mean rust has less opportunities for optimization?

To be clear: I do agree with you that I should have used an array of MaybeUninits, I just don't understand how my previous implementation could cause undefined behavior.

Here's an article about the topic.

Saying Rust doesn't have provenance would mean less optimizations are possible -- to an unacceptable level most likely. Saying Rust does have provenance would mean more optimizations are possible, i.e. valid -- and if UB results from those optimizations, your unsafe is at fault.

Rust has opted to have provenance.

Your previous implementation has undefined behavior, as soon as you are using uninitialized memory for types that are not valid over it, or you are creating a reference without provenance, potentially from a box/vec that was previously turned into integers. This is regardless of any optimizations performed by the compiler. Optimizations are just the motivating example commonly used to explain why we want UB to be a thing. In reality, it is like in maths if you were to plug something into a function that is outside the domain of the function; it is meaningless. A code path that causes UB has no semantic meaning, regardless of what you think the code does on a "lower level". In the case of provenace, the RFC describes well that pointers are not just addresses where stuff is, despite generally being represented like this in assembly.

By the way, I also just realized that you cannot turn a &[MaybeUninit<u8>; N] back into a reference to the original type in general, because of cells. The type needs to be core::marker::Freeze for this to be sound. I think MaybeUninit<UnsafeCell<u8>> may be correct if you wrap the type in an UnsafeCell before putting it in the buffer.