first of all: I am completly new to Rust. It might absolutly be the case that I implemented a "non optimal version" in rust.
I made a little benchmark for rust and c++. Both benchmarks check the numbers from 2 to 100000 if they are prime numbers and if they are they are added to a counter. Both implementations use the same algorithm. The algorithm itself is pretty bad, I know that you could do a "Sieve of Eratosthenes" approach instead or atleast calc the sqrt of the number to check - but thats not the point of this benchmark. Even though the algorithm itself is bad the results should be more or less equal because "the book" states: "Rust is a systems programming language focused on three goals: safety, speed,
and concurrency.". However, they are not.
In my environment the C++ code took around 1.2 seconds (Visual Studio, default release settings).
Rust took around 1.7 seconds (cargo run --release).
2 Questions:
Why is rust around 30% slower in this benchmark?
Is there some way to improve the speed of the rust version without changeing the algorithm?
Rust src:
extern crate time;
fn main(){
let mut anti_optimization = 0;
let t1 = time::now();
for i in 2..100000{
if is_prim(i){
anti_optimization += i;
}
}
let t2 = time::now();
let diff = t2-t1;
println!("{}", diff.num_milliseconds());
println!("{}", anti_optimization);
}
fn is_prim(x: i32) -> bool{
for i in 2..x{
if x%i == 0 {
return false;
}
}
true
}
C++ src:
#include <iostream>
#include <chrono>
bool isPrim(int n);
int main()
{
int antiOptimization = 0;
auto t1 = std::chrono::high_resolution_clock::now();
for (int i = 2; i < 100000; i++) {
if (isPrim(i)) {
antiOptimization += i;
}
}
auto t2 = std::chrono::high_resolution_clock::now();
auto diff = t2 - t1;
int ms = std::chrono::duration_cast<std::chrono::milliseconds>(diff).count();
std::cout << ms << std::endl;
std::cout << antiOptimization << std::endl;
while (true);
return 0;
}
bool isPrim(int n)
{
for (int i = 2; i < n; i++) {
if (n%i == 0) {
return false;
}
}
return true;
}
My guess is that it is the difference between Clang and MVCC and not between Rust and C++. It would be interesting to hear your results for C++ with clang!
As for the possible improvements of Rust code, here is an optimization for the source code size:
extern crate time;
fn main(){
let t1 = time::now();
let anti_optimization = (2..100000)
.filter(|&i| is_prim(i))
.fold(0, |x, y| x + y); // .sum is unstable :(
let t2 = time::now();
let diff = t2-t1;
println!("{}", diff.num_milliseconds());
println!("{}", anti_optimization);
}
fn is_prim(x: i32) -> bool{
return (2..x).all(|i| x % i != 0)
}
Surprisingly, this version of is_prim makes the Rust program run about 30% faster on my machine! I'm using rustc 1.10.0-nightly (cd6a40017 2016-05-16) on x86_64 Linux with opt-level = 3.
Side note: If you want the Rust code to use a high-resolution monotonic clock (like the C++ code does), use std::time::Instant instead of time::now. Though at this scale it shouldn't make a difference most of the time.
Here's the Rust code I ended up with:
use std::ops::Add;
use std::time::Instant;
fn main() {
let t1 = Instant::now();
let anti_optimization = (2..100_000).filter(is_prim).fold(0, i32::add);
let diff = t1.elapsed();
println!("{}{:03}", diff.as_secs(), diff.subsec_nanos() / 1_000_000);
println!("{}", anti_optimization);
}
fn is_prim(&x: &i32) -> bool {
(2..x).all(|i| x % i != 0)
}
Update: On the same machine, the C++ version compiled with g++ 5.3.1 is the same speed as this Rust version (difference of < 0.5%, which is within the statistical noise). When compiled with clang++ 3.8.0, the C++ program is 3–4% slower. (Older versions of clang++ produce even slower code.) All benchmarking done with -O3.
Anyone targeting ARM without sdiv/udiv aka hardware division, to avoid suffering a 2x slowdown in this and similar Rust code, LLVM has to be built with this patch:
Here's the diff of the optimized ASM from Rust Playground in release mode. Edit: Hmm, on second glance it seems both versions have runtime checks. I'm not sure then but I linked an updated diff below.
On the rust playground this version is the fastest one because it doesn't use reference in is_prim unlike the updated version:
use std::time::Instant;
fn main() {
let mut anti_optimization = 0;
let t1 = Instant::now();
for i in 2..100000{
if is_prim(i){
anti_optimization += i;
}
}
let diff = t1.elapsed();
println!("{}{:03}", diff.as_secs(), diff.subsec_nanos() / 1_000_000);
println!("{}", anti_optimization);
}
fn is_prim(x: i32) -> bool {
(2..x).all(|i| x % i != 0)
}