Vec<Box<dyn Trait>> vs. using enum with Trait

I've got several different objects of varying sizes/complexities which all implement the same trait. My desired use case is to be able to parallel map over each object (as efficiently as possible) multiple times within a continuously running loop.

The options seem to be either an indirect Vec<Box<dyn Trait>> or to make a single enum to get the trait implemented for a compile-time Sized object which would allow going to [TraitEnum; N].

The problem with the enum option is that each struct is based on a generic type (like a typenum) which sets the size of types to use for its fields from Vector3/Matrix3 up to Vector6/Matrix6. That alone only gives a size ratio of 2:1 to 4:1 (6/3 to 36/9). But on top of that, the different structs also have significantly varying sizes, with the largest likely being about 8x the smallest. Thus the final enum size ratio could be up to ~30:1, which if I recall correctly is well above what Clippy recommends.

The two options aren't drastically difficult to implement, so I may just end up making both and conditionally compiling two versions or allowing runtime selection for benchmarking. But, I'm leaning towards dropping the enum option entirely. Does anyone have any thoughts on the performance of the Vec<Box<dyn Trait>> option? The enum feels like a relatively small optimization with a bad resulting memory footprint.

I suspect the biggest performance penalty will come from the extra memory fragmentation of using Boxes. You could also investigate using smallbox, which will store large items on the heap and small ones inline.

1 Like

Thanks for the suggestion! I did not know smallbox existed. A lot of the various objects are probably below the size limit. And the larger objects will have significantly more computationally expensive methods, so the impact of being on the heap would be lower relatively.

And now I'm realising that all it takes is one enum variant being too large and then all variants would end up forced onto the heap... What is the struct size_of() limit for stack allocation on common x86 desktop processors? I recall reading that it was around 4kB.

For the smallbox to be helpful, you'll need to use the trait object approach, with a type like [SmallBox<dyn Trait, _>; N]. I haven't ever used it, though, so I don't know how you construct one of those.

Yeah, the enum option is definitely out if I can't keep ALL of the variants below the size limit. But! If ALL of the variants can be stack allocated, then smallbox should automatically be better than an enum.

It seems that smallbox:

  1. allocates a fixed-size block of memory on the stack (using a struct that holds a fixed-sized array)
  2. overwrites the object onto that block
  3. and then the SmallBox gets a pointer to the object

The docs say you can specify your own sizes as well. So I could custom size a SmallBox for every object. This gets a little more complex as I'm also aiming to have both f64 and f32 versions (or more!). So with 2 primitives * ~10 types * 4 vector/matrix sizes each comes to 80 different sizes. Which makes me wonder, could I use mem::size_of() to select the size to pass each smallbox!() macro call to let the compiler do all the work?

I'm looking to be able set up any number of parallel runtimes. So optimizing an entire runtime's memory footprint to be as small and easy to load from heap to stack as possible seems like a better goal than just trying to enforce inefficient stack allocation (though that maybe simpler and perform just as well for a single runtime).

Typical approach to combat an enum variant that’s outsized compared to the rest is to store that variant’s data on the heap (eg put in a Box), ie:

enum MyEnum {
   X(SmallStruct),
   Y(SmallStruct2),
   Z(Box<LargeStruct>),
}

Enum vs trait object is mostly about closed vs open set of types/impls, less about size differences.

5 Likes

Ahh, of course! Use exactly the thing I asked about originally :rofl:. I can be really good at missing the obvious sometimes.

Well, I've got two seemingly fully workable options; looks like it's time to check actual sizes of the smallest and largest variants.

I'm still leaning away from the enum, though. The smallbox route seems like it could allow for some serious optimization. The 'wasted space' per object with smallbox is just a pointer (plus PhantomData)

/// An optimized box that store value on stack or on heap depending on its size
pub struct SmallBox<T: ?Sized, Space> {
    space: ManuallyDrop<Space>,
    ptr: *const T,
    _phantom: PhantomData<T>,
}

Well, size_of() is showing between about 50 and 800 bytes... a 16x difference. But, I think it's still below the size limit where I'd need to Box any of the enum variants!

4 kB of stack for a modern desktop machine would be wayyyy to small. It's usually a couple of megabytes. I think 8 MB on Linux and 1 MB on Windows.

The limit wasn't cache size, but internals. I had to look it up again - I saw it in the Rust in Action chapter on memory. It claims a maximum size of 4kb for any 'hot working' portion will keep lookups fast. The follow on to that was 400kb as the next limit in order to keep the CPU translation cache happy. Been a while since I read that, I guess I incorrectly took 4kb to be the stack/heap split limit. But that 400kb limit seems more realistic as a sizeable fraction of actual cache sizes.

Access order makes a huge difference. Memory reads are O(1) if done in a way that the prefetcher predicts, but O(√working-set-size) if functionally random.

3 Likes

Well, after a bit more investigating, smallbox seems to have the same allocation size issue as an enum. The function return signature must provide the size for stack/heap split. So, I'll end up with Vec<SmallBox<dyn Trait, S64>> or whatever size I end up needing to fit things. Effectively the same as making an enum that is already as big as S64.

Does your statement imply that heap allocation of memory is not terribly problematic as long as its access is ordered and predictable? The case that I want to avoid is loading lots of 1kb enums if most of them only contain 100b of actual data. So the vector option might be likely to come out on top?

The furthest extreme I'm now considering is trying to initialize some kind of larger arena fixed-block of memory. It'd be tedious with typenum, but not terrible. I'd just have to make a block large enough to store an entire runtime worth of structs, such that the Vec<Box<dyn Trait>> can be an array of pointers instead. (would it be [*const dyn Trait; N]?)

Guess I'll make Vec<Box<dyn Trait>> work, see how using an enum instead changes things, and then consider whether to bother with anything more extreme. Premature optimization is bad, except that I keep learning more from trying it.

Would something like this be viable? Basically trying to merge the concept of smallbox with a fixed size array of pointers into an owned block of memory.

use core::mem::ManuallyDrop;

pub trait ArrayTrait {}
pub trait Dim { const VALUE: usize; }

// Biggest challenge -> guarantee that only N objects with total size less than
// S can be provided. Fallback -> S must allow for N `Box<dyn ArrayTrait>`s.
pub struct DynTraitArray<N: Dim, S> {
    array: [*const dyn ArrayTrait; <N as Dim>::VALUE],
    space: ManuallyDrop<S>,
}

Notes:

  • Memory layout to be fixed for the life of the struct
    • mutable struct just needs to be able to iter_mut() the held structs
  • Don't need to make it work as a standalone crate
    • can have runtime checks and panic! wherever easier