The heap, dynamic dispatch, and performance

I am having to use dynamic dispatch to allow different types that implement a specific Trait within a Vec. My question is:

How much performance impact does storing Box<dyn Trait>'s within a Vec<> have compared to making different vecs for each Trait type and dropping the Box wrapper?

trait MyTrait {}
struct MyStruct1{}
struct MyStruct2 {}
struct MyStruct3 {}

impl MyTrait for MyStruct1 {}
impl MyTrait for MyStruct2 {}
impl MyTrait for MyStruct3 {}

/// How is the performance of this
struct DynStore {
    inner: Vec<Box<dyn MyTrait>>

/// Compared to this?
struct ExplicitStore {
    inner1: Vec<MyStruct1>,
    inner2: Vec<MyStruct2>,
    inner3: Vec<MyStruct3>

In relative terms, significant. In absolute terms, possibly insignificant.

With 3 concrete vecs you'll get contiguous memory layout, monomorphized code for each type, and maybe even autovectorization.

With dyn you get data scattered over the heap, and loops doing dynamic dispatch without knowledge of the type they're working with.

But keep in mind that both versions may be pretty fast in practice, unless you're processing millions of items.

There's a third option that's a compromise between the two — a Vec of enums.

1 Like

Kornel, I find it interesting that as the distance between pointers increases, the time required to fetch the data increases (especially on the heap). At this level, does it have to do something with physical constraints (e.g., path distance of electron)? Might you understand why?

"distance" is not a good metric for this. What matters for modern CPUs is cache utilization, and there are many factors in it, such as things sharing cache lines, prefetch of the next cache line, and size of the working set. A very rough approximation is that when things are "close" they're faster, but that doesn't mean cost is proportional to pointer distance, but rather that stuff is more likely to be in the fragments of memory that happen to be cached already.


That makes sense, since the change in distance between physical sectors is insignificant for the speed of light. But yeah, if you have data stored down near the CPU in L1/L2 cacheland, you're definitely going to be getting a speed boost since it's right next to the CPU physically. Whereas with RAM, I look at my motherboard and see it slotted physically "far away" from the CPU.

There is also a kind of middle ground presented in this blog post.

1 Like

Given that this relies on the layout of trait objects (to get the vtable) it is likely UB.

1 Like

Why? I mean it's not set in stone (yet) but we know what it looks like. While waiting for std::raw::TraitObject it seems decent enough.

1 Like

It's certainly UB if you have to mutate inner data such that it becomes resized. Pushing and popping from the back and front should be safe, so long as the data doesn't change in size

Oops, I somehow missed the std::raw::TraitObject

The brain is fallible, give yourself a break :wink: