Why enum value binding in Rust is so slow?

I am currently learning Rust because I wanted to use it in project that requires a very high performance. I initially fallen in love with enums but then I started to evaluate their performance and I have found something that is really boggling me. Here is an example:

    pub fn eval(&self) -> i64 {
        match self {
            MyEnum::V1 => 1,
            MyEnum::V2(_) => 2,
            MyEnum::V3 => 3,
        }
    }
    pub fn eval2(&self) -> i64 {
        match self {
            MyEnum::V1 => 1,
            MyEnum::V2(a) => a.eval2(),
            MyEnum::V3 => 3,
        }
    }
}


fn main() {
    const EXAMPLES: usize = 10000000;
    let en = MyEnum::V1{};

    let start = Instant::now();
    let mut sum = 0;
    for _ in 0..EXAMPLES {
        sum += en.eval()
    }
    println!("enum without fields func call sum: {} at {:?}", sum, start.elapsed());

    let start = Instant::now();
    let mut sum = 0;
    for _ in 0..EXAMPLES {
        sum += en.eval2()
    }
    println!("enum with field func call sum: {} at {:?}", sum, start.elapsed());
}

Results I got:

enum without fields func call sum: 10000000 at 100ns
enum with field func call sum: 10000000 at 6.3425ms

eval function should execute exactly the same instructions as eval2 for V1 enum but it's working about 60x slower. Why is this happening? How can I optimize it?

What's your enum definition? Most likely you are seeing eval being inlined, which can't happen for eval2 if it is recursive. This means that the eval1 elides the loop entirely, and the eval2 doesn't elide the loop.

Are you running your code with cargo run or cargo run --release?

With cargo run --release

Sorry for not posting it. It is definitions:

pub enum MyEnum<'a> {
    V1,
    V2(&'a MyEnum<'a>),
    V3,
}

Ok, so yes, I think the lack of inlining recursive functions is at fault. It prevents a host of other optimizations from running, which makes your code slow.

1 Like

@RustyYato is correct. The issue is that eval2() is recursive. If you make it non-recursive, it optimizes the same as eval().

You can see that in this playground.

2 Likes

So let's compare those versions:
This is running slow:
https://play.rust-lang.org/?version=nightly&mode=release&edition=2018&gist=f95ea730a4478fb5cd8fd02c90365c80

This is running fast:
https://play.rust-lang.org/?version=nightly&mode=release&edition=2018&gist=267032a623da621fbcd8ba3a02bbdd0f

Second version has V3 and it's usage removed so eval2 is still recursive.Why is that?

You can answer that question by inspecting the generated code:

Slow: https://godbolt.org/z/6zm22h
Fast: https://godbolt.org/z/GR9FfG

In short, with only 2 variants, eval2 can only return a single possible value. The optimizer can elide the recursion entirely. With 3 variants, eval2 needs to make runtime decisions with branches.

1 Like

I think it's worth pointing out that in all of these cases, you aren't testing the performance of matching an enum, you're actually testing how smart the optimizer is in specific situations.

In general, the cost of the cost of matching enums is going to be a test for the enum tag and a branch. The optimizer can often do clever things from there.

2 Likes

Exactly. Being mindful of code generation it handy when analyzing performance tests. And it's well known that microbenchmarks tend not to be representative of real world code.

I'd also like to provide a reminder about statistical analysis when it comes to benchmarking; use criterion instead of trying to roll your own benchmark suite.

1 Like

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.