Why dynamic dispatch is faster than static dispatch?

as the title.
we always have this problem when we design programs that choise dynamic dispatch or static dispatch?
i know the dynamic dispatch is slow a little than static dispatch.
so i did a test to understand how much slow.
but the result is dynamic dispatch faster than static dispatch?!
the test code:

pub struct  StrTest{
    pub str:String
}

impl TestTrait for StrTest{
    fn get_str(&self) -> &str {
        self.str.as_str()
    }
}

pub trait TestTrait{
    fn get_str(&self)->&str;
}

fn test_impl<T:TestTrait>(t:T){

        t.get_str();

}

fn test_dyn(t:&TestTrait){
        t.get_str();
}

fn main()  {


    let time = std::time::SystemTime::now();
    let mut t1 = StrTest{str:"test1".to_owned()};
    test_impl(t1);
    println!("time1:{:?}",time.elapsed().unwrap());

    let time1 = std::time::SystemTime::now();
    let mut t2 = StrTest{str:"test2".to_owned()};
    test_dyn(&t2);
    println!("time2:{:?}",time1.elapsed().unwrap());
}

time2 is at least 10 times faster than time1
like this print:
"time1:14µs
time2:1µs"

trait object maintains a runtime that has query overhead when invoked,staitc dispatch generates code at compile time with no overhead at run time.
but why result is that?

1 Like

Before investigating this further, I'd recommend benchmarking it with a framework like bencher - Rust. There are a lot of factors which can affect measuring things like this, especially when the times are so tiny (in µs).

For instance, how do you know that the speed difference isn't just because test_impl ran first?

Using a benchmark framework allows getting rid of a lot of the random factors at once - it isn't perfect, but you can get much more reliable results.

5 Likes

This looks like something with the instruction cache and simplicity of the code - check this playground, where the first called function is always the slowest no matter which one at was, and on the second call for each method we get almost the same times.

4 Likes

As people said earlier, performance is sensitive topic and proper benchmark is hard. Another way to compare performance of code indirectly is to compare assemblies generated from each code.

Nothing can be faster than single ret.

5 Likes

yes,i have test many times,as u said

thanks,i will try

oh,I don't know much about assembly,but u are right, nothing can faster then ret

  1. Did you compile with --release? Without it all speed measurements are going to be misleading and you will get nonsense results. Debug code has weird overheads from compiler and libstd implementation details, which are not present in release code.

  2. SystemTime is not suitable for benchmarks. At least use Instant, or something like bencher or https://lib.rs/criterion

3 Likes

yes,i have compile with --release.

Considering that you are also calling to_owned() inside the timed region, I'm guessing you're really just measuring the time it takes to allocate the string. It doesn't surprise me that the first allocation in the program would be faster than the second, and swapping the order of your tests does indeed seem to swap which one is faster.

In fact I could easily see it just inlining both function calls and completely removing them.

5 Likes

When your benchmark is saying something you know isn't true, that's a good time to see whether you're actually measuring what you think you are. Note in particular that microbenchmarking is incredibly hard to do well, and in some ways impossible.

I extracted the test_dyn part of your example into a function to help isolate what it's doing:

https://rust.godbolt.org/z/HsRfS9

And, in fact, the compiler is smart enough to have removed it entirely. When you make a dyn call for something that's obviously a known type, the compiler can turn it into a static call. And then in this case the compiler knows that everything it's doing is side-effect-free, and you're not using the result in any way, so it can just get rid of the whole thing:

example::demo:
        ret
7 Likes

You are also not really testing the same. Your first function moves the object into the function, whereas the second one does not. The correct functions to compare are:

fn test_impl<T:TestTrait>(t:&T){
    t.get_str();
}

fn test_dyn(t:&dyn TestTrait){
    t.get_str();
}
3 Likes

yes,thanks for points out that the error place, i have modify code as this,but result is same like

thanks u,u let me notice that complied code is good way to understand :grinning:

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.