# One case of "if" performance compared to "match"

This is a simple recursive solution for the Euler Problem n.45 ( #45 Triangular, pentagonal, and hexagonal - Project Euler , adapted from euler-scala/Euler045.scala at master · samskivert/euler-scala · GitHub ):

``````fn findh(pent: u64, h: u64) -> u64 {
let hex = h * (2 * h - 1);
if hex > pent { findh(pent, h - 1)
} else if hex < pent { 0
} else { hex }
}

fn findp(tri: u64, p: u64) -> u64 {
let pent = p * (3 * p - 1 ) / 2;
if pent > tri { findp(tri, p - 1)
} else if pent < tri { 0
} else { findh(pent, p - 1) }
}

fn main() {
let e45 = (286 ..).map(|t| findp(t * (t + 1) / 2, t - 1)).find(|&t| t != 0).unwrap();
println!("{}", e45); // 1_533_776_805
}
``````

The same code using a match:

``````use std::cmp::Ordering::*;

fn findh(pent: u64, h: u64) -> u64 {
let hex = h * (2 * h - 1);
match hex.cmp(&pent) {
Less => 0,
Equal => hex,
Greater => findh(pent, h - 1)
}
}
fn findp(tri: u64, p: u64) -> u64 {
let pent = p * (3 * p - 1 ) / 2;
match pent.cmp(&tri) {
Less => 0,
Equal => findh(pent, p - 1),
Greater => findp(tri, p - 1)
}
}

fn main() {
let e45 = (286 ..).map(|t| findp(t * (t + 1) / 2, t - 1)).find(|&t| t != 0).unwrap();
println!("{}", e45); // 1_533_776_805
}
``````

On my system the if-based version, runs in about 0.35 seconds, while the match-based version runs in about 0.72 seconds, compiled with -O to remove the recursion from both.

If I compile using `#[inline(never)]` on the if-based findh() function, I see the asm (rustc 1.9.0-nightly (b12b4e4e3 2016-03-17)):

``````_ZN5findh20h23cdf1eac3cc0c77eaaE:
leaq    -1(%rdx,%rdx), %rax
.align    16, 0x90
.LBB0_1:
movq    %rax, %r8
imulq    %rdx, %r8
decq    %rdx
cmpq    %rcx, %r8
ja    .LBB0_1
xorl    %eax, %eax
cmpq    %rcx, %r8
cmovaeq    %r8, %rax
retq
``````

While the match-based version gives:

``````_ZN5findh20hbde3eb7bf24af692faaE:
leaq    -1(%rdx,%rdx), %r8
jmp    .LBB0_1
.align    16, 0x90
.LBB0_7:
.LBB0_1:
movq    %r8, %rax
imulq    %rdx, %rax
xorl    %r9d, %r9d
cmpq    %rcx, %rax
movb    \$-1, %r10b
jb    .LBB0_3
movb    \$1, %r10b
.LBB0_3:
je    .LBB0_5
movb    %r10b, %r9b
.LBB0_5:
testb    %r9b, %r9b
je    .LBB0_9
movzbl    %r9b, %eax
cmpl    \$1, %eax
je    .LBB0_7
xorl    %eax, %eax
.LBB0_9:
retq
``````

I've also tried to compile the match-based code with `-C opt-level=3 -C target-cpu=native` but the result is the same.

Can Rustc improve to compile the match-based version about as well as the if-based version, or do I have to expect this kind of match-induced pessimization?

1 Like

`if` / `match` may not be the whole story. Just reversing the two tests in the `if` version is enough to significantly change the generated code (playpen). Presumably `std::u64::cmp` has the comparisons internally in the bad order for this code.

I played a bit with `cmp` implementations: Rust Playground

In LLVM IR, it looks like `cmp` is the same as my `u64elg` (equal, less, greater), but I didn't look at rust source to confirm this is really true. That produces asm with two jumps, and so do `u64egl` and `u64gel`. The rest look better with only one jump in `u64leg`, `u64lge`, and `u64gle`. I'm not fluent in LLVM IR, but it looks like the jumps correspond directly to the `select` clauses.

I'm not sure if this information is really actionable, but I thought it was interesting.

Your `if` and `match` examples have differently ordered branches. Could you reorder the branches to be the same and measure again?

OK, the faster was the if-based version, so I've reordered the match-based version:

``````#[inline(never)]
fn findh(pent: u64, h: u64) -> u64 {
let hex = h * (2 * h - 1);
match hex.cmp(&pent) {
Greater => findh(pent, h - 1),
Less => 0,
Equal => hex,
}
}
``````

The run-time (without `#[inline(never)]`) is the same, about 0.71 seconds, and the asm is:

``````_ZN5findh20hf87ab16ab6986693faaE:
leaq    -1(%rdx,%rdx), %r8
jmp    .LBB0_1
.align    16, 0x90
.LBB0_7:
decq    %rdx
.LBB0_1:
movq    %r8, %rax
imulq    %rdx, %rax
xorl    %r9d, %r9d
cmpq    %rcx, %rax
movb    \$-1, %r10b
jb    .LBB0_3
movb    \$1, %r10b
.LBB0_3:
je    .LBB0_5
movb    %r10b, %r9b
.LBB0_5:
movzbl    %r9b, %r10d
cmpl    \$255, %r10d
je    .LBB0_8
testb    %r9b, %r9b
je    .LBB0_9
jmp    .LBB0_7
.LBB0_8:
xorl    %eax, %eax
.LBB0_9:
retq
``````

Looks like a performance bug to me.

I have not yet started filing bug reports for Rust... so are you willing to do it?