One case of "if" performance compared to "match"


#1

This is a simple recursive solution for the Euler Problem n.45 ( https://projecteuler.net/index.php?section=problems&id=45 , adapted from https://github.com/samskivert/euler-scala/blob/master/Euler045.scala ):

fn findh(pent: u64, h: u64) -> u64 {
    let hex = h * (2 * h - 1);
    if hex > pent { findh(pent, h - 1)
    } else if hex < pent { 0
    } else { hex }
}

fn findp(tri: u64, p: u64) -> u64 {
    let pent = p * (3 * p - 1 ) / 2;
    if pent > tri { findp(tri, p - 1)
    } else if pent < tri { 0
    } else { findh(pent, p - 1) }
}

fn main() {
    let e45 = (286 ..).map(|t| findp(t * (t + 1) / 2, t - 1)).find(|&t| t != 0).unwrap();
    println!("{}", e45); // 1_533_776_805
}

The same code using a match:

use std::cmp::Ordering::*;

fn findh(pent: u64, h: u64) -> u64 {
    let hex = h * (2 * h - 1);
    match hex.cmp(&pent) {
        Less => 0,
        Equal => hex,
        Greater => findh(pent, h - 1)
    }
}
fn findp(tri: u64, p: u64) -> u64 {
    let pent = p * (3 * p - 1 ) / 2;
    match pent.cmp(&tri) {
        Less => 0,
        Equal => findh(pent, p - 1),
        Greater => findp(tri, p - 1)
    }
}

fn main() {
    let e45 = (286 ..).map(|t| findp(t * (t + 1) / 2, t - 1)).find(|&t| t != 0).unwrap();
    println!("{}", e45); // 1_533_776_805
}

On my system the if-based version, runs in about 0.35 seconds, while the match-based version runs in about 0.72 seconds, compiled with -O to remove the recursion from both.

If I compile using #[inline(never)] on the if-based findh() function, I see the asm (rustc 1.9.0-nightly (b12b4e4e3 2016-03-17)):

_ZN5findh20h23cdf1eac3cc0c77eaaE:
    leaq    -1(%rdx,%rdx), %rax
    .align    16, 0x90
.LBB0_1:
    movq    %rax, %r8
    imulq    %rdx, %r8
    decq    %rdx
    addq    $-2, %rax
    cmpq    %rcx, %r8
    ja    .LBB0_1
    xorl    %eax, %eax
    cmpq    %rcx, %r8
    cmovaeq    %r8, %rax
    retq

While the match-based version gives:

_ZN5findh20hbde3eb7bf24af692faaE:
    leaq    -1(%rdx,%rdx), %r8
    jmp    .LBB0_1
    .align    16, 0x90
.LBB0_7:
    addq    $-1, %rdx
    addq    $-2, %r8
.LBB0_1:
    movq    %r8, %rax
    imulq    %rdx, %rax
    xorl    %r9d, %r9d
    cmpq    %rcx, %rax
    movb    $-1, %r10b
    jb    .LBB0_3
    movb    $1, %r10b
.LBB0_3:
    je    .LBB0_5
    movb    %r10b, %r9b
.LBB0_5:
    testb    %r9b, %r9b
    je    .LBB0_9
    movzbl    %r9b, %eax
    cmpl    $1, %eax
    je    .LBB0_7
    xorl    %eax, %eax
.LBB0_9:
    retq

I’ve also tried to compile the match-based code with -C opt-level=3 -C target-cpu=native but the result is the same.

Can Rustc improve to compile the match-based version about as well as the if-based version, or do I have to expect this kind of match-induced pessimization?


#2

if / match may not be the whole story. Just reversing the two tests in the if version is enough to significantly change the generated code (playpen). Presumably std::u64::cmp has the comparisons internally in the bad order for this code.


#3

I played a bit with cmp implementations: http://is.gd/cRuiq4

In LLVM IR, it looks like cmp is the same as my u64elg (equal, less, greater), but I didn’t look at rust source to confirm this is really true. That produces asm with two jumps, and so do u64egl and u64gel. The rest look better with only one jump in u64leg, u64lge, and u64gle. I’m not fluent in LLVM IR, but it looks like the jumps correspond directly to the select clauses.

I’m not sure if this information is really actionable, but I thought it was interesting.


#4

Your if and match examples have differently ordered branches. Could you reorder the branches to be the same and measure again?


#5

OK, the faster was the if-based version, so I’ve reordered the match-based version:

#[inline(never)]
fn findh(pent: u64, h: u64) -> u64 {
    let hex = h * (2 * h - 1);
    match hex.cmp(&pent) {
        Greater => findh(pent, h - 1),
        Less => 0,
        Equal => hex,
    }
}

The run-time (without #[inline(never)]) is the same, about 0.71 seconds, and the asm is:

_ZN5findh20hf87ab16ab6986693faaE:
    leaq    -1(%rdx,%rdx), %r8
    jmp    .LBB0_1
    .align    16, 0x90
.LBB0_7:
    decq    %rdx
    addq    $-2, %r8
.LBB0_1:
    movq    %r8, %rax
    imulq    %rdx, %rax
    xorl    %r9d, %r9d
    cmpq    %rcx, %rax
    movb    $-1, %r10b
    jb    .LBB0_3
    movb    $1, %r10b
.LBB0_3:
    je    .LBB0_5
    movb    %r10b, %r9b
.LBB0_5:
    movzbl    %r9b, %r10d
    cmpl    $255, %r10d
    je    .LBB0_8
    testb    %r9b, %r9b
    je    .LBB0_9
    jmp    .LBB0_7
.LBB0_8:
    xorl    %eax, %eax
.LBB0_9:
    retq

#6

Looks like a performance bug to me.


#7

I have not yet started filing bug reports for Rust… so are you willing to do it?