Why dynamic dispatch is faster than static dispatch?

reison · May 26, 2020, 6:53am

as the title.
we always have this problem when we design programs that choise dynamic dispatch or static dispatch?
i know the dynamic dispatch is slow a little than static dispatch.
so i did a test to understand how much slow.
but the result is dynamic dispatch faster than static dispatch?!
the test code:

pub struct  StrTest{
    pub str:String
}

impl TestTrait for StrTest{
    fn get_str(&self) -> &str {
        self.str.as_str()
    }
}

pub trait TestTrait{
    fn get_str(&self)->&str;
}

fn test_impl<T:TestTrait>(t:T){

        t.get_str();

}

fn test_dyn(t:&TestTrait){
        t.get_str();
}

fn main()  {


    let time = std::time::SystemTime::now();
    let mut t1 = StrTest{str:"test1".to_owned()};
    test_impl(t1);
    println!("time1:{:?}",time.elapsed().unwrap());

    let time1 = std::time::SystemTime::now();
    let mut t2 = StrTest{str:"test2".to_owned()};
    test_dyn(&t2);
    println!("time2:{:?}",time1.elapsed().unwrap());
}

time2 is at least 10 times faster than time1
like this print:
"time1:14µs
time2:1µs"

trait object maintains a runtime that has query overhead when invoked,staitc dispatch generates code at compile time with no overhead at run time.
but why result is that?

daboross · May 26, 2020, 7:14am

Before investigating this further, I'd recommend benchmarking it with a framework like bencher - Rust. There are a lot of factors which can affect measuring things like this, especially when the times are so tiny (in µs).

For instance, how do you know that the speed difference isn't just because test_impl ran first?

Using a benchmark framework allows getting rid of a lot of the random factors at once - it isn't perfect, but you can get much more reliable results.

Cerber-Ursi · May 26, 2020, 7:15am

This looks like something with the instruction cache and simplicity of the code - check this playground, where the first called function is always the slowest no matter which one at was, and on the second call for each method we get almost the same times.

Hyeonu · May 26, 2020, 7:19am

As people said earlier, performance is sensitive topic and proper benchmark is hard. Another way to compare performance of code indirectly is to compare assemblies generated from each code.

Nothing can be faster than single ret.

reison · May 26, 2020, 7:49am

yes,i have test many times,as u said

reison · May 26, 2020, 7:50am

thanks,i will try

reison · May 26, 2020, 7:53am

oh,I don't know much about assembly,but u are right, nothing can faster then ret

kornel · May 26, 2020, 12:58pm

Did you compile with --release? Without it all speed measurements are going to be misleading and you will get nonsense results. Debug code has weird overheads from compiler and libstd implementation details, which are not present in release code.
SystemTime is not suitable for benchmarks. At least use Instant, or something like bencher or https://lib.rs/criterion

reison · May 26, 2020, 5:46pm

yes,i have compile with --release.

alice · May 26, 2020, 6:10pm

Considering that you are also calling to_owned() inside the timed region, I'm guessing you're really just measuring the time it takes to allocate the string. It doesn't surprise me that the first allocation in the program would be faster than the second, and swapping the order of your tests does indeed seem to swap which one is faster.

In fact I could easily see it just inlining both function calls and completely removing them.

scottmcm · May 27, 2020, 5:17pm

When your benchmark is saying something you know isn't true, that's a good time to see whether you're actually measuring what you think you are. Note in particular that microbenchmarking is incredibly hard to do well, and in some ways impossible.

I extracted the test_dyn part of your example into a function to help isolate what it's doing:

https://rust.godbolt.org/z/HsRfS9

And, in fact, the compiler is smart enough to have removed it entirely. When you make a dyn call for something that's obviously a known type, the compiler can turn it into a static call. And then in this case the compiler knows that everything it's doing is side-effect-free, and you're not using the result in any way, so it can just get rid of the whole thing:

example::demo:
        ret

Matthias247 · May 27, 2020, 8:57pm

You are also not really testing the same. Your first function moves the object into the function, whereas the second one does not. The correct functions to compare are:

fn test_impl<T:TestTrait>(t:&T){
    t.get_str();
}

fn test_dyn(t:&dyn TestTrait){
    t.get_str();
}

reison · May 28, 2020, 7:40am

yes,thanks for points out that the error place, i have modify code as this,but result is same like

reison · May 28, 2020, 7:49am

thanks u,u let me notice that complied code is good way to understand

system · August 26, 2020, 7:49am

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
How much slower is a Dynamic Dispatch really?	6	5880	November 7, 2023
Performance of dynamic dispatching vs static dispatching help	3	425	May 6, 2024
Dynamic dispatch performance in loops help	4	2613	December 28, 2019
Is stealthily generated dynamic dispatch a thing? help	4	152	March 23, 2025
The heap, dynamic dispatch, and performance	11	1173	December 5, 2019

Why dynamic dispatch is faster than static dispatch?

Related topics