Performance implications of `Box<Trait>` vs `enum` delegation


#1

I was wondering if anyone knows what the relative performance implications of using a Box<Trait> vs having an enum that wraps a subset of types that implement Trait and delegates all methods to them.

Some code as an example, say there were 2 types to be supported both implementing std::io::Read:

struct Foo { ... }
struct Bar { ... }

impl Read for Foo { ... }
impl Read for Bar { ... }

enum Wrapper {
    Foo(Foo),
    Bar(Bar),
}

impl Read for Wrapper {
    fn read(&mut self, buf: &mut [u8]) -> Result<usize> {
        match *self {
            Wrapper::Foo(ref mut foo) => foo.read(buf),
            Wrapper::Bar(ref mut bar) => bar.read(buf),
        }
    }

    // ... delegation for the rest of the methods
}

For a small case like this I feel like the Wrapper::read function could probably be inlined and there would be very little overhead compared to the dynamic dispatch for a Box<Read>. However at some point as you add more types to support Wrapper::read itself will become too large to inline. I assume the performance overhead of delegating to a separate function that dispatches based on the enum tag will be comparable to the overhead of vtable based dynamic dispatch?

In the cases I am trying to decide which to use there are probably a half dozen types that need supporting, it’s a semi-open set of types where I don’t really need to be able to tell which type it is so using a Boxed trait would be fine, but since it’s a limited set I would also be fine with having to extend the enum and recompile to add a new type if that would be noticeably faster (definitely using a macro to implement the delegation).

I realise there is also the memory usage to take into consideration. I believe generally the memory usage of the different variants I will be supporting should be similar, and I will definitely be using #![warn(variant_size_differences)] to keep track of them. But if anyone has any experience/comments around potential issues with that I would appreciate them also.


#2

This is a good question. I suspect it’ll be hard to say for sure, in general - would need to benchmark a specific case.

The other performance implication here, besides virtual dispatch (or not), is data locality. With the Box<Trait>, you’ll need a guaranteed memory load to get at the data; with the enum, the data is inline (assuming nothing is boxed or otherwise residing on the heap) and you can get good locality even if the function isn’t inlined.

There’s also something to be said about non-performance related aspects of working with a (Sized) enum with a known set of variants (exhaustive match, etc) vs a completely opaque open ended Box<Trait>. I know you’d like to focus on the performance aspects, but if those seem like a wash, these other aspects bear consideration.


#3

I’ve never considered doing it this way. If you get a macro that works with some generality, please put it up on crates.io!


#4

Here are the performance differences that I would expect between these two solutions, borrowing from what you and @vitalyd said and adding a bit of my own:

  • Enums keep all the data on the stack, whereas Box lives on the heap, so the usual stack vs heap tradeoffs apply (large stack objects are expensive to move around, but heap requires a pointer indirection and thus reduces cache locality).
  • Enums are as large as the largest variant, which as you mentioned can cause memory bloat.
  • Enum-based code makes it much easier for the compiler to know which variant it is dealing with, and to remove the pattern matching indirection in situations where the context provides this information. Thus, I would expect the enum-based solutions to perform better at devirtualization (eliminating repeated table lookups when calling multiple virtual methods of the same object).
  • Where the compiler does not know the enum variant, I would expect enum matching to have similar code complexity w.r.t. a vtable: you need to locate the method in the jump table and then to call it, as with a vtable. Note, however, that enum jump tables are indexed on variants, whereas vtables are indexed on methods, so the two forms of table lookup will have vastly different cache locality sweet spots.

As as vitalyd mentioned, the usability differences are also important when making a choice:

  • Enums represent a closed set of type, trait objects represent an open set.
  • In terms of code cleanness, enums cope better with lots of methods acting on a small set of variants, whereas trait objects cope better with lots of variants sharing a small set of methods. Conveniently enough, that’s also the setting that minimizes the size of the jump tables used by each solution, increasing the odds that they fit in the cache.
  • Match statements allow you to write different code for each variant, a possibility which you do not exploit by using them as a vtable. Diverging code across match arms has a more complex performance trade-off (it effectively amounts to forcing the inlining of method code in your solution).
  • Even when living outside of a Box, trait objects require pointer-based access, which in Rust can reduce usability compared to enums since mutable data cannot be aliased. Enums do not require pointer indirection, for example they can be stored in containers as-is.