Hi, I am a Rust newbie.
I played with it and made a prime number generator - up to the values of ~10^9 .
I made the same program in Node. I was amazed when Node program was faster. For values up to 10^7 it took Rust 14.6s and Node 4.8s.
The code is provided bellow. I could change the position of sqrt check but that's not the point.
1st argument is the limit up to which primes are generated - 10000000 for the times mentioned before
Rust code - rustc 1.43.0 (4fb7144ed 2020-04-20)
use std::env;
use std::time::Instant;
use std::fs::File;
use std::io::Write;
fn main() {
let args: Vec<String> = env::args().collect();
let limit: u32 = args[1].parse().unwrap();
if(limit > u32::MAX - 1) {
println!("Input limited to u32::MAX - 1 = 4_294_967_294u32");
return;
}
let mut primes: Vec<u32> = Vec::new();
primes.push(2);
let mut i: u32 = 3;
let mut j: u32 = 0;
let start = Instant::now();
loop {
if(i > limit) { break; }
let mut factor: u32 = 0;
let sqrt: f64 = (i as f64).sqrt();
let sqrt_u32: u32 = sqrt.ceil() as u32;
let len: u32 = (primes.len() as u32) - 1;
loop {
if(j > len) { break; }
let prime: u32 = primes[j as usize];
let modulo: u32 = i % prime;
if(modulo < 1) {
factor = prime;
break;
}
// if((prime as f64) > sqrt) { break; }
if(prime > sqrt_u32) { break; }
j = j + 1;
}
j = 0;
if(factor < 1) {
// new prime found
primes.push(i);
}
if(i%1000000 < 1) { println!("Logging {}", i.to_string()); }
i = i + 1;
}
let duration = start.elapsed();
println!("Duration: {:?}", duration);
// print
/*
for i in 0..=primes.len() - 1 {
println!("{}", primes[i].to_string());
}
*/
// write
let file_name: String = format!("{}.txt", limit.to_string());
let mut file = File::create(file_name).expect("Unable to create file");
write!(file, "Limit: {}\nFound: {}\nDuration: {:?}\n[", limit.to_string(), primes.len().to_string(), duration);
for prime in &primes {
write!(file, "{}, ", prime);
}
}
Node.js code - Node v12.16.2
let fs = require('fs');
const args = process.argv.slice(2)
let limit = args[0] || 100;
let primes = [2];
let start = Date.now();
for(let i = 3; i <= limit; i++) {
// check if i has any factors among prior primes
let factor = -1;
let sqrt = Math.sqrt(i);
for(j = 0; j < primes.length; j++) {
let prime = primes[j]
let mod = i % prime;
if(mod < 1) {
factor = prime;
break;
}
if(prime > sqrt) break;
}
if(factor < 0) {
// new prime found
primes.push(i);
}
if(i%1000000 < 1) console.log('Logging', i);
}
let duration = Date.now() - start;
let string = 'Limit: ' + limit + '\n';
string += 'Found: ' + primes.length + '\n';
string += 'Duration: ' + duration + '\n';
string += JSON.stringify(primes);
fs.writeFileSync(limit + '.txt', string);
Is there anything fundamentally wrong that I am doing in my rust code so that performance is bad?
Using iterators and for loops in the Rust code would also make it more efficient, as iterators generally optimise well. The variable j appears to only be used to index into primes, so iterating over primes directly would avoid the bounds check when indexing. Using for loops is also more idiomatic as it creates more structured code.
use std::env;
use std::time::Instant;
use std::fs::File;
use std::io::Write;
fn main() {
let args: Vec<String> = env::args().collect();
let limit: u32 = args[1].parse().unwrap();
if limit > u32::MAX - 1 {
println!("Input limited to u32::MAX - 1 = 4_294_967_294u32");
return;
}
// will grow as needed, but will not need to allocate many times
let mut primes: Vec<u32> = Vec::with_capacity(100_000);
primes.push(2);
let start = Instant::now();
// use for loop in iterators instead of just using loops,
// .for_each() is good too.
for i in 3..=limit {
let mut factor: u32 = 0;
let sqrt: f64 = (i as f64).sqrt();
let sqrt_u32: u32 = sqrt.ceil() as u32;
// same like loop above, using iterators is more efficient, accessing elements would require bound checks.
for prime in primes.iter() {
let prime = *prime as u32;
let modulo: u32 = i % prime;
if modulo == 0 {
factor = prime;
break;
}
if prime > sqrt_u32 {
break;
}
}
if factor == 0 {
// new prime found
primes.push(i);
}
}
let duration = start.elapsed();
println!("Duration: {:?}, {} primes found", duration, primes.len());
// print
/*
for i in 0..=primes.len() - 1 {
println!("{}", primes[i].to_string());
}
*/
// write
// let file_name: String = format!("{}.txt", limit.to_string());
// let mut file = File::create(file_name).expect("Unable to create file");
// write!(file, "Limit: {}\nFound: {}\nDuration: {:?}\n[", limit.to_string(), primes.len().to_string(), duration);
// for prime in &primes {
// write!(file, "{}, ", prime);
// }
}
I'm surprised, but my functional-style implementation using iterators performs about the same as the original. I'm guessing the code is simple enough that the optimiser can optimise both implementations down to (roughly) the same machine code.
fn calculate_primes_functional(limit: u32) -> Vec<u32> {
let mut primes = vec![2];
for candidate in 3..limit {
let sqrt_candidate = (candidate as f64).sqrt().ceil() as u32;
let has_factors = primes
.iter()
.copied()
.take_while(|prime| *prime <= sqrt_candidate)
.any(|prime| candidate % prime == 0);
if !has_factors {
primes.push(candidate);
}
}
primes
}
A tidied up version of @jakasi2's original function:
let mut primes = vec![2];
for i in 3..limit {
let mut factor: u32 = 0;
let sqrt: f64 = (i as f64).sqrt();
let sqrt_u32: u32 = sqrt.ceil() as u32;
let len: u32 = (primes.len() as u32) - 1;
for j in 0..len {
let prime: u32 = primes[j as usize];
let modulo: u32 = i % prime;
if modulo < 1 {
factor = prime;
break;
}
if prime > sqrt_u32 {
break;
}
}
if factor < 1 {
// new prime found
primes.push(i);
}
}
primes
}
And running it on my computer:
$ cargo bench
Compiling primes v0.1.0 (/tmp/primes)
Finished bench [optimized] target(s) in 0.26s
Running target/release/deps/primes-1da2e23e7df7b32f
running 0 tests
test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out
Running target/release/deps/bench-8f04c133c8165076
running 2 tests
test functional_method ... bench: 33,520,543 ns/iter (+/- 715,806)
test iterative_method ... bench: 33,861,599 ns/iter (+/- 888,395)
test result: ok. 0 passed; 0 failed; 0 ignored; 2 measured; 0 filtered out
cargo bench 20.72s user 0.10s system 101% cpu 20.579 total
Like I said, with optimisations turned on Rust generally optimises iterators well to something like the imperative equivalent. Of course, without optimisation they tend to be slower because you are writing a sequence of function calls that aren't inlined or optimised.
You've reintroduced the indexing of primes there. In @naim 's code, iterating over primes removes the bounds check that is introduced by indexing.
So.. we have a collection of some sort that can be iterated upon. .iter() returns the iterator. On each pass the for loop calls .next() on the iterator and the pointer to the next value is set to the variable which is specified (for variable in primes.iter()) . Is this the correct explanation?
What is with indexing and bounds on Vec. I know about linked lists and why iterating is better than indexing - O(n) - but I choose Vec because I red on std::collections - Rust that Vec has get of O(1) and is fine for just pushing things. What bounds are calculated when I want to retrieve a random element, primes[25] for example.
Correct, it calls the iterator's next on every iteration, which will give you a reference to the value (&u32).
When you read primes[25], the std::ops::Index implementation of Vec is called, which asserts that 25 is between 0 and self.len(). You can avoid this check by iterating, or calling the unsafe get_unchecked.
Continuing on from @naim's comment. Using normal indexing means you do one check at the top of the for-loop to see whether we need to keep going, then a second check when you actually do the indexing.
When you use an iterator it'll call the next() method, which will do a single check to see if we need to keep going (source code), returning a reference to the next element calculated using pointer arithmetic if we haven't reached the end.
Often the compiler is smart enough to identify that we don't need the indexing check if the for-loop check is happy, but this isn't always the case.
Using cargo bench always compiles with optimisations... That's why I was surprised that my functional-style implementation using just iterators and no indexing performed almost identically.
The .take_while() adapter will pass a reference to the next item to the predicate. It doesn't give you the item by value.
The docs for take_while() have an example for exactly this situation:
Because the closure passed to take_while() takes a reference, and many iterators iterate over references, this leads to a possibly confusing situation, where the type of the closure is a double reference:
let a = [-1, 0, 1];
let mut iter = a.iter().take_while(|x| **x < 0); // need two *s!
assert_eq!(iter.next(), Some(&-1));
assert_eq!(iter.next(), None);