Performance of dynamic dispatching vs static dispatching

Hey Everyone,

I've been exploring ways to enhance performance, and I came across the concept of static dispatching. Before diving into implementation, I conducted an experiment to gauge its impact:

fn main() {
    let m = Mercedes { hp: 234 };
    let a = AMG { hp: 987 };

    let start1 = Instant::now();
    test_dynamic_dispatching(&a, &m);
    let duration1 = start1.elapsed();
    println!("Time elapsed in dynamic dispatching is: {:?}", duration1);


    let start2 = Instant::now();
    test_static_dispatching(&a, &m);
    let duration2 = start2.elapsed();
    println!("Time elapsed in static dispatching is: {:?}", duration2);

}

enum Vehicle<'a> {
    AMG(&'a AMG),
    Mercedes(&'a Mercedes),
}

fn test_dynamic_dispatching<'a>(x: &'a AMG, y: &'a Mercedes) -> &'a dyn LandCapable {
    if x.get_hp() > y.get_hp() {
        return x;
    }
    y
}

fn test_static_dispatching<'a>(x: &'a AMG, y: &'a Mercedes) -> Vehicle<'a> {
    if x.get_hp() > y.get_hp() {
        return Vehicle::AMG(&x);
    }
    Vehicle::Mercedes(&y)
}

I am not putting the whole code here cause I want it to be readable :smile:

By executing the code I have those results; I expected it

Finished dev [unoptimized + debuginfo] target(s) in 0.00s
Running target/debug/sandbox
Time elapsed in dynamic dispatching is: 90ns
Time elapsed in static dispatching is: 30ns

And I decided to set up a loop

    for i in 0..1000 {
        test_dynamic_dispatching(&a, &m);
    }

and

    for i in 0..1000 {
        test_static_dispatching(&a, &m);
    }

To my surprise, the results were unexpected:

Finished dev [unoptimized + debuginfo] target(s) in 0.00s
Running target/debug/sandbox
Time elapsed in dynamic dispatching is: 5.77µs
Time elapsed in static dispatching is: 7.033µs

Does anyone have insights into why this consistently occurs?

Thanks :smile:

Finished dev [unoptimized + debuginfo] target(s) in 0.00s

You're running unoptimized code. Therefore, the results aren't really meaningful for what performance you can get out of the two choices.

You should try an optimized build with cargo run --release.

However, optimization can also throw off benchmarks due to the code being optimized into nothing since it has constant inputs and ignored results; careful use of black_box() can be necessary to address this, like:

use std::hint::black_box;

for i in 0..1000 {
    black_box(test_dynamic_dispatching(black_box(&a), black_box(&m)));
}

It's generally wise to use a benchmark framework like Criterion which will not only automatically apply black_box() to the benchmarked function's output (and input if you use one of the input-carrying bench methods), but also do statistical analysis of the results to reduce noise.

7 Likes

Also, note that your code doesn't actually contain any dynamic dispatch. It creates a &dyn LandCapable, but never calls any methods on it. The creation is the cheapest part because it consists of returning one of two vtable pointers along with the data pointer; the real costs of dynamic dispatch are

  • looking up and calling the functions stored in the vtable
  • the fact that those functions cannot be inlined into their callers
13 Likes