I am currently learning Rust because I wanted to use it in project that requires a very high performance. I initially fallen in love with enums but then I started to evaluate their performance and I have found something that is really boggling me. Here is an example:
pub fn eval(&self) -> i64 {
match self {
MyEnum::V1 => 1,
MyEnum::V2(_) => 2,
MyEnum::V3 => 3,
}
}
pub fn eval2(&self) -> i64 {
match self {
MyEnum::V1 => 1,
MyEnum::V2(a) => a.eval2(),
MyEnum::V3 => 3,
}
}
}
fn main() {
const EXAMPLES: usize = 10000000;
let en = MyEnum::V1{};
let start = Instant::now();
let mut sum = 0;
for _ in 0..EXAMPLES {
sum += en.eval()
}
println!("enum without fields func call sum: {} at {:?}", sum, start.elapsed());
let start = Instant::now();
let mut sum = 0;
for _ in 0..EXAMPLES {
sum += en.eval2()
}
println!("enum with field func call sum: {} at {:?}", sum, start.elapsed());
}
Results I got:
enum without fields func call sum: 10000000 at 100ns
enum with field func call sum: 10000000 at 6.3425ms
eval function should execute exactly the same instructions as eval2 for V1 enum but it's working about 60x slower. Why is this happening? How can I optimize it?
What's your enum definition? Most likely you are seeing eval being inlined, which can't happen for eval2 if it is recursive. This means that the eval1 elides the loop entirely, and the eval2 doesn't elide the loop.
Ok, so yes, I think the lack of inlining recursive functions is at fault. It prevents a host of other optimizations from running, which makes your code slow.
In short, with only 2 variants, eval2 can only return a single possible value. The optimizer can elide the recursion entirely. With 3 variants, eval2 needs to make runtime decisions with branches.
I think it's worth pointing out that in all of these cases, you aren't testing the performance of matching an enum, you're actually testing how smart the optimizer is in specific situations.
In general, the cost of the cost of matching enums is going to be a test for the enum tag and a branch. The optimizer can often do clever things from there.
Exactly. Being mindful of code generation it handy when analyzing performance tests. And it's well known that microbenchmarks tend not to be representative of real world code.
I'd also like to provide a reminder about statistical analysis when it comes to benchmarking; use criterion instead of trying to roll your own benchmark suite.