Enum dispatch, casting between mutable and const variants of the ref struct

I have a project where I use enum dispatch (as opposed to dyn) in order to get better control over inlining and branch prediction.

The issue I'm having is that I have mutable and const methods, and therefore I need two variants of the ref struct. The code is structured like this:

trait MyTrait {
    fn const_f(&self);
    fn mut_f(&mut self);
}

struct TypeA {}
impl MyTrait for TypeA {
    fn const_f(&self) {}
    fn mut_f(&mut self) {}
}

struct TypeB {}
impl MyTrait for TypeB {
    fn const_f(&self) {}
    fn mut_f(&mut self) {}
}

enum MyRef<'a> {
    TypeA(&'a TypeA),
    TypeB(&'a TypeB),
}

impl MyRef<'_> {
    fn const_f(&self) {
        match self {
            Self::TypeA(t) => t.const_f(),
            Self::TypeB(t) => t.const_f(),
        }
    }
    fn mut_f(&mut self) { unreachable!() }
}

enum MyRefMut<'a> {
    TypeA(&'a mut TypeA),
    TypeB(&'a mut TypeB),
}

impl MyRefMut<'_> {
    fn const_f(&self) {
        match self {
            Self::TypeA(t) => t.const_f(),
            Self::TypeB(t) => t.const_f(),
        }
    }
    fn mut_f(&mut self) {
        match self {
            Self::TypeA(t) => t.mut_f(),
            Self::TypeB(t) => t.mut_f(),
        }
    }
}

This is generally a yucky amount of boilerplate, but I shrug that off because having the dispatch tables like this gives me a place to do things like insert a #[cold] wrapper around certain paths, etc.

However, having two identical sets of dispatch tables, for for the mutable and const variants of the ref structs, feels like a bridge too far. Or at least the straw that's breaking this particular camel's back. Also, it can't be good for the instruction cache to have twice as many dispatch tables as is needed - I don't think the compiler is able to figure that out and deduplicate them.

So I could ensure the two enums get the same discriminant tag values, and then transmute the mutable variant into the const variant. But that feels a little dangerous.

What would you guys do in this situation?

Thanks for any thoughts.

can you split the trait?

trait MyTraitConst {
    fn const_f(&self);
}
trait MyTraitMut {
    fn mut_f(&mut self);
}

you can also create a combined subtrait if the two traits are mostly used together:

trait MyTrait: MyTraitConst + MyTraitMut {}
impl<T: MyTraitConst + MyTraitMut> MyTrait for T {}

can you split the trait?

I can split the trait, but I'm not seeing how that improves things. Most of the methods are still const, and I still need to be able to dispatch all the const methods from both the const ref and the mut ref.

So I'd still need duplicated dispatch tables for all the const methods, unless I'm missing something.

ok, when I suggested to split the trait, I had the impression that you have clear separation between the usage of shared references and exclusive references, where you use MyRef in one case and MyRefMut in another, so I imagined you need to only implement the &mut self methods for MyRefMut.

but apparently, I had misunderstood your use case. sorry about that.

but I want to say, your design looks quite unusual to me.

typically, enum based static dispatching would use owned enum variants, mainly to avoid boxing the values. but you are using borrowed enums, which is the part that confused me.

to be honest, I see very little benefit using MyRef<'a> compared to regular &'a dyn MyTrait. the enum is basically a (thin) data pointer plus a discriminator, while the trait object ref is the same data pointer plus a vtable pointer. they have pretty much the same memory footprint, and contains the equivalent amount of information.

in cold paths, the indirect call may (in theory) incur an additional cache miss in order to load the function pointer from the vtable, however, this should not be a problem in hot paths, which is what performance critical code typically cares. and vtables are typically hot, and read-only, so they are probably available to icache anyway, oh, and don't forget the prefetcher!

have you measured the performace difference to back your claims about the static dispatch vs dynamic dispatch? are you sure this is not just pre-mature optimization?

also, there exists an optimization technique called de-virtualization, basically, the vtable address is used as a discriminator to convert dynamic dispatch into static dispatch. to my knowledge, main stream C++ compilers typically can do a decent job with it, and I would imagine rustc has similar optimizations too, but I don't know rustc well enough so don't quote me on this.

1 Like

you don't need unsafe, you can safely convert a &'a MyRefMut<'_> to MyRef<'a>, if this is what you meant:

impl<'b> MyRefMut<'b> {
    // the type `&'a Self` implicitly requires `'b: 'a`
    fn as_shared_ref<'a>(&'a self) -> MyRef<'a> {
        match self {
            Self::TypeA(t) => MyRef::TypeA(t),
            Self::TypeB(t) => MyRef::TypeB(t),
        }
    }
}

notice the lifetime annotatoin: this is just a reborrow from an exclusive reference.

1 Like

Thanks for your thoughts.

I see very little benefit using MyRef<'a> compared to regular &'a dyn MyTrait . the enum is basically a (thin) data pointer plus a discriminator, while the trait object ref is the same data pointer plus a vtable pointer. they have pretty much the same memory footprint, and contains the equivalent amount of information.

You are 100% correct, save for one thing... Inlining is impossible across a dyn dispatch, and I haven't been able to get it to branch predict efficiently. (although maybe I'm missing a trick or two)

have you measured the performance difference to back your claims about the static dispatch vs dynamic dispatch? are you sure this is not just pre-mature optimization?

Yup. It's about 30% in a lot of my top-line benchmarks. I have both implementations running side-by-side.

you don't need unsafe , you can safely convert a &'a MyRefMut<'_> to MyRef<'a> , if this is what you meant:

Yes. That's exactly what I mean, but it needs to be zero cost, since it happens every function call and the point is to save dispatch overhead. Ideally the compiler would flatten the conversion table into the dispatch table, but that requires the second outer function dispatch to be inlined.

In practice I haven't been able to get the flattening to work (even in the inlined case) but perhaps I'm screwing something up.

that'a good point. inlining enables further optimization possibilities so it's probably the one most important compiler transformation to the code in terms of performance.

I don't have experience optimizing at this level, I don't even know how to measure the branch predictor.

does your data have certain access pattern? did you sort the data by type beforehand, like many games sort the entities?

wow, that's huge. now it makes so much sense for these kind of optimizations.

my intuition is, when the two enums have compatible layout, the safe reborrow should be an nop after type checking, when the lifetimes get erased. it is essentially no different from a transmute() (or, rather, transmute_copy()) when it comes to codegen.

I cannot prove this is always true, but here's snippet that seems agree with my intuition:

1 Like

I'm happy to report, that, the enum->enum match-based conversion does seem to flatten, even when I stacked quite a bit of complexity into the program.

So I think your solution is the right way to go.

Thank you!

Here is my modified version of the test code that ensures the result of the dispatch does real work that is different depending on the variant, and that the variant is from a volatile so the compiler can't collapse that, and the generated code is still comparable. Here on rust.godbolt.org