as the title.
we always have this problem when we design programs that choise dynamic dispatch or static dispatch?
i know the dynamic dispatch is slow a little than static dispatch.
so i did a test to understand how much slow.
but the result is dynamic dispatch faster than static dispatch?!
the test code:
pub struct StrTest{
pub str:String
}
impl TestTrait for StrTest{
fn get_str(&self) -> &str {
self.str.as_str()
}
}
pub trait TestTrait{
fn get_str(&self)->&str;
}
fn test_impl<T:TestTrait>(t:T){
t.get_str();
}
fn test_dyn(t:&TestTrait){
t.get_str();
}
fn main() {
let time = std::time::SystemTime::now();
let mut t1 = StrTest{str:"test1".to_owned()};
test_impl(t1);
println!("time1:{:?}",time.elapsed().unwrap());
let time1 = std::time::SystemTime::now();
let mut t2 = StrTest{str:"test2".to_owned()};
test_dyn(&t2);
println!("time2:{:?}",time1.elapsed().unwrap());
}
time2 is at least 10 times faster than time1
like this print:
"time1:14µs
time2:1µs"
trait object maintains a runtime that has query overhead when invoked,staitc dispatch generates code at compile time with no overhead at run time.
but why result is that?
Before investigating this further, I'd recommend benchmarking it with a framework like bencher - Rust. There are a lot of factors which can affect measuring things like this, especially when the times are so tiny (in µs).
For instance, how do you know that the speed difference isn't just because test_impl ran first?
Using a benchmark framework allows getting rid of a lot of the random factors at once - it isn't perfect, but you can get much more reliable results.
This looks like something with the instruction cache and simplicity of the code - check this playground, where the first called function is always the slowest no matter which one at was, and on the second call for each method we get almost the same times.
As people said earlier, performance is sensitive topic and proper benchmark is hard. Another way to compare performance of code indirectly is to compare assemblies generated from each code.
Did you compile with --release? Without it all speed measurements are going to be misleading and you will get nonsense results. Debug code has weird overheads from compiler and libstd implementation details, which are not present in release code.
SystemTime is not suitable for benchmarks. At least use Instant, or something like bencher or https://lib.rs/criterion
Considering that you are also calling to_owned() inside the timed region, I'm guessing you're really just measuring the time it takes to allocate the string. It doesn't surprise me that the first allocation in the program would be faster than the second, and swapping the order of your tests does indeed seem to swap which one is faster.
In fact I could easily see it just inlining both function calls and completely removing them.
When your benchmark is saying something you know isn't true, that's a good time to see whether you're actually measuring what you think you are. Note in particular that microbenchmarking is incredibly hard to do well, and in some ways impossible.
I extracted the test_dyn part of your example into a function to help isolate what it's doing:
And, in fact, the compiler is smart enough to have removed it entirely. When you make a dyn call for something that's obviously a known type, the compiler can turn it into a static call. And then in this case the compiler knows that everything it's doing is side-effect-free, and you're not using the result in any way, so it can just get rid of the whole thing:
You are also not really testing the same. Your first function moves the object into the function, whereas the second one does not. The correct functions to compare are: