Why is this prime number benchmark 30% slower than C++?


#1

Hey,

first of all: I am completly new to Rust. It might absolutly be the case that I implemented a “non optimal version” in rust.

I made a little benchmark for rust and c++. Both benchmarks check the numbers from 2 to 100000 if they are prime numbers and if they are they are added to a counter. Both implementations use the same algorithm. The algorithm itself is pretty bad, I know that you could do a “Sieve of Eratosthenes” approach instead or atleast calc the sqrt of the number to check - but thats not the point of this benchmark. Even though the algorithm itself is bad the results should be more or less equal because “the book” states: “Rust is a systems programming language focused on three goals: safety, speed,
and concurrency.”. However, they are not.

In my environment the C++ code took around 1.2 seconds (Visual Studio, default release settings).
Rust took around 1.7 seconds (cargo run --release).

2 Questions:

  1. Why is rust around 30% slower in this benchmark?
  2. Is there some way to improve the speed of the rust version without changeing the algorithm?

Rust src:

extern crate time;

fn main(){
    let mut anti_optimization = 0;
    
    let t1 = time::now();
    for i in 2..100000{
        if is_prim(i){
            anti_optimization += i;
        }
    }
    let t2 = time::now();
    let diff = t2-t1;
    println!("{}", diff.num_milliseconds());
    println!("{}", anti_optimization);
}

fn is_prim(x: i32) -> bool{
    for i in 2..x{
        if x%i == 0 {
            return false;
        }
    }
    true
}

C++ src:

#include <iostream>
#include <chrono>

bool isPrim(int n);

int main()
{
    int antiOptimization = 0;
    auto t1 = std::chrono::high_resolution_clock::now();
    for (int i = 2; i < 100000; i++) {
        if (isPrim(i)) {
            antiOptimization += i;
        }
    }
    auto t2 = std::chrono::high_resolution_clock::now();
    auto diff = t2 - t1;
    int ms = std::chrono::duration_cast<std::chrono::milliseconds>(diff).count();
    std::cout << ms << std::endl;
    std::cout << antiOptimization << std::endl;
    while (true);
    return 0;
}

bool isPrim(int n)
{
    for (int i = 2; i < n; i++) {
        if (n%i == 0) {
            return false;
        }
    }
    return true;
}

Thanks!


#2

Just a guess, Rust uses checked arithmetic operations by default. Have you tried to use unchecked ones?


#3

Hm, I get pretty much identical results here on Linux (using clang++ -O2 to compile C++):

user@UNIT-326 [06:01:49 PM] [~/trash/prime] [master *]
-> % ./a.out               
1505
454396537
user@UNIT-326 [06:02:22 PM] [~/trash/prime] [master *]
-> % ./target/release/prime
1510
454396537

My guess is that it is the difference between Clang and MVCC and not between Rust and C++. It would be interesting to hear your results for C++ with clang!

As for the possible improvements of Rust code, here is an optimization for the source code size:

extern crate time;

fn main(){
    let t1 = time::now();
    let anti_optimization = (2..100000)
        .filter(|&i| is_prim(i))
        .fold(0, |x, y| x + y); // .sum is unstable :(
    let t2 = time::now();
    let diff = t2-t1;
    println!("{}", diff.num_milliseconds());
    println!("{}", anti_optimization);
}

fn is_prim(x: i32) -> bool{
    return (2..x).all(|i| x % i != 0)
}

:slight_smile:


#4

Both times are almost equal on my machine, too:

$ cargo run --release
Compiling prime v0.0.1 (file:///tmp/prime)
Running `target/release/prime`
2068
454396537
$ g++ -Wall -O3 prime.cpp -o prime &&
2113
454396537

In fact, the C++ code is a bit slower. I’m using rustc 1.8.0 (db2939409 2016-04-11) and g++ (GCC) 6.1.1 20160501.


#5

@Brotcrunsher: You can add the language shortcut to code blocks to get syntax highlighting:


```rust
fn main() ...

#6

Surprisingly, this version of is_prim makes the Rust program run about 30% faster on my machine! I’m using rustc 1.10.0-nightly (cd6a40017 2016-05-16) on x86_64 Linux with opt-level = 3.

Side note: If you want the Rust code to use a high-resolution monotonic clock (like the C++ code does), use std::time::Instant instead of time::now. Though at this scale it shouldn’t make a difference most of the time.

Here’s the Rust code I ended up with:

use std::ops::Add;
use std::time::Instant;

fn main() {
    let t1 = Instant::now();
    let anti_optimization = (2..100_000).filter(is_prim).fold(0, i32::add);
    let diff = t1.elapsed();
    println!("{}{:03}", diff.as_secs(), diff.subsec_nanos() / 1_000_000);
    println!("{}", anti_optimization);
}

fn is_prim(&x: &i32) -> bool {
    (2..x).all(|i| x % i != 0)
}

Update: On the same machine, the C++ version compiled with g++ 5.3.1 is the same speed as this Rust version (difference of < 0.5%, which is within the statistical noise). When compiled with clang++ 3.8.0, the C++ program is 3–4% slower. (Older versions of clang++ produce even slower code.) All benchmarking done with -O3.


#7

The original C++ (with clang++) and your latest Rust (with 1.10) are exactly the same speed on my Linux x86-64 machine.


#8

Fixed, thanks. :slight_smile:


#9

I wonder what the difference is…


#10

Anyone targeting ARM without sdiv/udiv aka hardware division, to avoid suffering a 2x slowdown in this and similar Rust code, LLVM has to be built with this patch:

http://llvm.org/viewvc/llvm-project?view=revision&revision=259657

Should land in 3.8.1 but until then here’s probably the only unbroken rustc in existence :grin:


#11

Hmm, I’m also seeing a difference:

Rust:

1625
454396537

C++:

1099
454396537

However, with the updated code from mbrubeck they’re about even!

Ruest (mbrubeck):

1102
454396537

I used -C opt-level=3 and -O3 to compile the code.

rustc --version
rustc 1.8.0 (db2939409 2016-04-11)

clang++ --version
Apple LLVM version 7.3.0 (clang-703.0.31)
Target: x86_64-apple-darwin15.4.0
Thread model: posix
InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin

#12

Here’s the diff of the optimized ASM from Rust Playground in release mode. Edit: Hmm, on second glance it seems both versions have runtime checks. I’m not sure then but I linked an updated diff below. :slight_smile:

Rust Playground: Original Version (with updated time measuring)
Rust Playground: Updated Version

ASM diff


#13

On the rust playground this version is the fastest one because it doesn’t use reference in is_prim unlike the updated version:

use std::time::Instant;

fn main() {
    let mut anti_optimization = 0;

    let t1 = Instant::now();
    for i in 2..100000{
        if is_prim(i){
            anti_optimization += i;
        }
    }
    let diff = t1.elapsed();
    println!("{}{:03}", diff.as_secs(), diff.subsec_nanos() / 1_000_000);
    println!("{}", anti_optimization);
}

fn is_prim(x: i32) -> bool {
    (2..x).all(|i| x % i != 0)
}

Rust Playground: Mixed Version