Dynamic dispatch performance in loops

Hi,

I am aware, that when I have a trait object and call a method on it takes longer than calling a method on a concrete type due to dynamic dispatching.

But I would assume when calling the same method inside a loop several times, the compiler could optimize and do the dynamic dispatch only once.
Saving the result from the first dynamic dispatching and use it in remaining iterations.
And so the overhead for the dynamic dispatch should amortize when looping very often!?

I did some benchmarks but it looks Rust does the dynamic dispatch in every loop iteration:

test tests::dispatch::bench_dynamic_dispatch ... bench:  21,216,658 ns/iter (+/- 2,915,253)
test tests::dispatch::bench_static_dispatch  ... bench:   3,582,578 ns/iter (+/- 609,219)

So here are my questions:

  • would it be possible to increase the performance by doing the dynamic dispatching only once like I mentioned? (would require changes to rustc)
  • If not, are there other methods to increase the performance in such a scenario? (scenario: calling the same method on a trait object very often)

Used rustc version: rustc 1.39.0-nightly (1dd188489 2019-09-22)

my code for the benchmarks

#![feature(test)]
extern crate test;

#[cfg(test)]
mod tests {
    mod dispatch {
        use test::black_box;
        use test::Bencher;

        const L: usize = 10_000_000;

        trait Trait {
            fn method(&self) {
                black_box(0);
            }
        }
        struct Struct();
        impl Trait for Struct {}

        fn dynamic_dispatch(p: &Box<dyn Trait>) {
            (0..L).into_iter().for_each(|_| p.method())
        }

        fn static_dispatch<T: Trait>(p: &Box<T>) {
            (0..L).into_iter().for_each(|_| p.method())
        }

        #[bench]
        fn bench_dynamic_dispatch(b: &mut Bencher) {
            let p: Box<dyn Trait> = Box::new(Struct());
            b.iter(|| {
                dynamic_dispatch(&p);
            });
        }

        #[bench]
        fn bench_static_dispatch(b: &mut Bencher) {
            let p = Box::new(Struct());
            b.iter(|| {
                static_dispatch(&p);
            });
        }
    }
}
1 Like

I don't see how you could do this. If a function isn't known until runtime then at the bare minimum you'll always need to execute a call instruction to invoke the function pointer.

The problem with dynamic dispatch isn't necessarily the "dynamic" part, jumping to a different bit of code can be quite fast, especially when the branch predictor kicks sees you're always calling that method with the same type. The problem is dynamic dispatch acts as a black box to the compiler, inhibiting it from doing other optimisations which could speed things up significantly (e.g. inlining the use of Trait::method() and dropping the call to black_box() because the result is never used).

You may also want to look into Performance Guided Optimisation. I remember watching a C++ talk ages ago where they showed how a compiler can "remove" the dynamic dispatch by adding special cases based on sample runs. It essentially compiled an if-else chain checking if the argument had a known type so it could invoke the method using static dispatch, falling back to dynamic dispatch,

3 Likes

Thanks, that makes sense!
I did also a run with opt-level=0, and the difference was only x1.5 (instead of x7 with opt-level=3). This shows that most of the performance losses are due to the fact that the compiler cannot optimize.

Kind of what an JIT does.

trait object / dyn-based dynamic dispatch cannot be explicitely "cached" by the programmer, but an enum-based one can.

trait Handler : Debug {
    fn handle (self: &'_ mut Self, value: u8)
    ;
}

impl Handler for Vec<u8> {
    fn handle (self: &'_ mut Self, value: u8)
    {
        self.push(value);
    }
}

#[derive(Debug)]
struct SumComputer(u64);

impl Handler for u64 {
    fn handle (self: &'_ mut Self, value: u8)
    {
        self.0 += u64::from(value);
    }
}

Then, rather than having a dyn Handler, you can have your own enum:

#[derive(Debug)]
enum MyHandler {
    Vec(Vec<u8>),
    SumComputer(SumComputer),
}

then dispatch before the loop body to an appropriate concrete type:

fn handle_elements (
    handler: &'_ mut MyHandler, // instead of &mut dyn Handler
    elems: impl IntoIterator<Item = impl Into<u8>>,
)
{
    let elems = elems.into_iter().map(Into::into);
    use MyHandler::*;
    match *handler {
        | Vec(ref mut it) => {
            elems.for_each(|value| it.handle(value));
        },
        | SumComputer(ref mut it) => {
            elems.for_each(|value| it.handle(value));
        },
    }
    dbg!(handler);
}

The only issue of this technique is code repetition, but since it lends itself very well to duck typing, macros provide a good solution for this:

macro_rules! with_inner {(
    $handler:expr => |$ident:ident| $expr:expr
) => (
    match $handler {
        | &mut MyHandler::Vec(ref mut $ident) => {
            $expr
        },
        | &mut MyHandler::SumComputer(ref mut $ident) => {
            $expr
        },
    }
)}

fn handle_elements (
    handler: &'_ mut MyHandler, // instead of &mut dyn Handler
    elems: impl IntoIterator<Item = impl Into<u8>>,
)
{
    let elems = elems.into_iter().map(Into::into);
    with_inner!(handler => |it| {
        elems.for_each(|value| it.handle(value));
    });
    dbg!(handler);
}