Why is this prime number benchmark 30% slower than C++?

Brotcrunsher · May 18, 2016, 1:39pm

Hey,

first of all: I am completly new to Rust. It might absolutly be the case that I implemented a "non optimal version" in rust.

I made a little benchmark for rust and c++. Both benchmarks check the numbers from 2 to 100000 if they are prime numbers and if they are they are added to a counter. Both implementations use the same algorithm. The algorithm itself is pretty bad, I know that you could do a "Sieve of Eratosthenes" approach instead or atleast calc the sqrt of the number to check - but thats not the point of this benchmark. Even though the algorithm itself is bad the results should be more or less equal because "the book" states: "Rust is a systems programming language focused on three goals: safety, speed,
and concurrency.". However, they are not.

In my environment the C++ code took around 1.2 seconds (Visual Studio, default release settings).
Rust took around 1.7 seconds (cargo run --release).

2 Questions:

Why is rust around 30% slower in this benchmark?
Is there some way to improve the speed of the rust version without changeing the algorithm?

Rust src:

extern crate time;

fn main(){
    let mut anti_optimization = 0;
    
    let t1 = time::now();
    for i in 2..100000{
        if is_prim(i){
            anti_optimization += i;
        }
    }
    let t2 = time::now();
    let diff = t2-t1;
    println!("{}", diff.num_milliseconds());
    println!("{}", anti_optimization);
}

fn is_prim(x: i32) -> bool{
    for i in 2..x{
        if x%i == 0 {
            return false;
        }
    }
    true
}

C++ src:

#include <iostream>
#include <chrono>

bool isPrim(int n);

int main()
{
    int antiOptimization = 0;
    auto t1 = std::chrono::high_resolution_clock::now();
    for (int i = 2; i < 100000; i++) {
        if (isPrim(i)) {
            antiOptimization += i;
        }
    }
    auto t2 = std::chrono::high_resolution_clock::now();
    auto diff = t2 - t1;
    int ms = std::chrono::duration_cast<std::chrono::milliseconds>(diff).count();
    std::cout << ms << std::endl;
    std::cout << antiOptimization << std::endl;
    while (true);
    return 0;
}

bool isPrim(int n)
{
    for (int i = 2; i < n; i++) {
        if (n%i == 0) {
            return false;
        }
    }
    return true;
}

Thanks!

mholub · May 18, 2016, 3:06pm

Just a guess, Rust uses checked arithmetic operations by default. Have you tried to use unchecked ones?

matklad · May 18, 2016, 3:09pm

Hm, I get pretty much identical results here on Linux (using clang++ -O2 to compile C++):

user@UNIT-326 [06:01:49 PM] [~/trash/prime] [master *]
-> % ./a.out               
1505
454396537
user@UNIT-326 [06:02:22 PM] [~/trash/prime] [master *]
-> % ./target/release/prime
1510
454396537

My guess is that it is the difference between Clang and MVCC and not between Rust and C++. It would be interesting to hear your results for C++ with clang!

As for the possible improvements of Rust code, here is an optimization for the source code size:

extern crate time;

fn main(){
    let t1 = time::now();
    let anti_optimization = (2..100000)
        .filter(|&i| is_prim(i))
        .fold(0, |x, y| x + y); // .sum is unstable :(
    let t2 = time::now();
    let diff = t2-t1;
    println!("{}", diff.num_milliseconds());
    println!("{}", anti_optimization);
}

fn is_prim(x: i32) -> bool{
    return (2..x).all(|i| x % i != 0)
}

klingt.net · May 18, 2016, 3:16pm

Both times are almost equal on my machine, too:

$ cargo run --release
Compiling prime v0.0.1 (file:///tmp/prime)
Running `target/release/prime`
2068
454396537
$ g++ -Wall -O3 prime.cpp -o prime &&
2113
454396537

In fact, the C++ code is a bit slower. I'm using rustc 1.8.0 (db2939409 2016-04-11) and g++ (GCC) 6.1.1 20160501.

klingt.net · May 18, 2016, 3:29pm

@Brotcrunsher: You can add the language shortcut to code blocks to get syntax highlighting:


```rust
fn main() ...

mbrubeck · May 18, 2016, 4:36pm

Surprisingly, this version of is_prim makes the Rust program run about 30% faster on my machine! I'm using rustc 1.10.0-nightly (cd6a40017 2016-05-16) on x86_64 Linux with opt-level = 3.

Side note: If you want the Rust code to use a high-resolution monotonic clock (like the C++ code does), use std::time::Instant instead of time::now. Though at this scale it shouldn't make a difference most of the time.

Here's the Rust code I ended up with:

use std::ops::Add;
use std::time::Instant;

fn main() {
    let t1 = Instant::now();
    let anti_optimization = (2..100_000).filter(is_prim).fold(0, i32::add);
    let diff = t1.elapsed();
    println!("{}{:03}", diff.as_secs(), diff.subsec_nanos() / 1_000_000);
    println!("{}", anti_optimization);
}

fn is_prim(&x: &i32) -> bool {
    (2..x).all(|i| x % i != 0)
}

Update: On the same machine, the C++ version compiled with g++ 5.3.1 is the same speed as this Rust version (difference of < 0.5%, which is within the statistical noise). When compiled with clang++ 3.8.0, the C++ program is 3–4% slower. (Older versions of clang++ produce even slower code.) All benchmarking done with -O3.

birkenfeld · May 18, 2016, 4:49pm

The original C++ (with clang++) and your latest Rust (with 1.10) are exactly the same speed on my Linux x86-64 machine.

mbrubeck · May 18, 2016, 4:50pm

Fixed, thanks.

kirillkh · May 19, 2016, 12:36pm

I wonder what the difference is...

PeteVine · May 19, 2016, 1:38pm

Anyone targeting ARM without sdiv/udiv aka hardware division, to avoid suffering a 2x slowdown in this and similar Rust code, LLVM has to be built with this patch:

http://llvm.org/viewvc/llvm-project?view=revision&revision=259657

Should land in 3.8.1 but until then here's probably the only unbroken rustc in existence

pom · May 20, 2016, 11:05pm

Hmm, I'm also seeing a difference:

Rust:

1625
454396537

C++:

1099
454396537

However, with the updated code from mbrubeck they're about even!

Ruest (mbrubeck):

1102
454396537

I used -C opt-level=3 and -O3 to compile the code.

rustc --version
rustc 1.8.0 (db2939409 2016-04-11)

clang++ --version
Apple LLVM version 7.3.0 (clang-703.0.31)
Target: x86_64-apple-darwin15.4.0
Thread model: posix
InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin

pom · May 20, 2016, 11:12pm

Here's the diff of the optimized ASM from Rust Playground in release mode. Edit: Hmm, on second glance it seems both versions have runtime checks. I'm not sure then but I linked an updated diff below.

Rust Playground: Original Version (with updated time measuring)
Rust Playground: Updated Version

ASM diff

xosmig · May 25, 2016, 2:08pm

On the rust playground this version is the fastest one because it doesn't use reference in is_prim unlike the updated version:

use std::time::Instant;

fn main() {
    let mut anti_optimization = 0;

    let t1 = Instant::now();
    for i in 2..100000{
        if is_prim(i){
            anti_optimization += i;
        }
    }
    let diff = t1.elapsed();
    println!("{}{:03}", diff.as_secs(), diff.subsec_nanos() / 1_000_000);
    println!("{}", anti_optimization);
}

fn is_prim(x: i32) -> bool {
    (2..x).all(|i| x % i != 0)
}

Rust Playground: Mixed Version

Topic		Replies	Views
Rust program has only 42% the speed of similar c++ program help	26	6531	July 28, 2022
How to speed up this rust code? I'm measuring a 30% slowdown versus the C++ version help	25	10144	January 12, 2023
Are there any compilation time benchmarks of Rust vs. G++ vs. Clang++?	24	14773	July 3, 2022
In my benchmark, I found Rust slower than C	6	2010	May 19, 2022
Benchmark to compare performance of Rust with C/C++? help	11	15796	April 5, 2020

Why is this prime number benchmark 30% slower than C++?

Related topics