Small prime generation - Node faster than Rust

Hi, I am a Rust newbie.
I played with it and made a prime number generator - up to the values of ~10^9 .
I made the same program in Node. I was amazed when Node program was faster. For values up to 10^7 it took Rust 14.6s and Node 4.8s.

The code is provided bellow. I could change the position of sqrt check but that's not the point.

1st argument is the limit up to which primes are generated - 10000000 for the times mentioned before

Rust code - rustc 1.43.0 (4fb7144ed 2020-04-20)

use std::env;
use std::time::Instant;
use std::fs::File;
use std::io::Write;

fn main() {
    let args: Vec<String> = env::args().collect();
    let limit: u32 = args[1].parse().unwrap();

    if(limit > u32::MAX - 1) { 
        println!("Input limited to u32::MAX - 1 = 4_294_967_294u32");
        return;
    }

    let mut primes: Vec<u32> = Vec::new();
    primes.push(2);

    let mut i: u32 = 3;
    let mut j: u32 = 0;

    let start = Instant::now();
    loop {
        if(i > limit) { break; }
        
        let mut factor: u32 = 0;
        let sqrt: f64 = (i as f64).sqrt();

        let sqrt_u32: u32 = sqrt.ceil() as u32;
        let len: u32 = (primes.len() as u32) - 1;

        loop {
            if(j > len) { break; }

            let prime: u32 = primes[j as usize];
            let modulo: u32 = i % prime;

            if(modulo < 1) {
                factor = prime;
                break;
            }

            // if((prime as f64) > sqrt) { break; }
            if(prime > sqrt_u32) { break; }

            j = j + 1;
        }
        j = 0;

        if(factor < 1) {
            // new prime found
            primes.push(i);
        }
        if(i%1000000 < 1) { println!("Logging {}", i.to_string()); }

        i = i + 1;
    }

    let duration = start.elapsed();
    println!("Duration: {:?}", duration);

    // print
    /*
    for i in 0..=primes.len() - 1 {
        println!("{}", primes[i].to_string());
    }
    */

    // write
    let file_name: String = format!("{}.txt", limit.to_string());
    let mut file = File::create(file_name).expect("Unable to create file");                                                          
    write!(file, "Limit: {}\nFound: {}\nDuration: {:?}\n[", limit.to_string(), primes.len().to_string(), duration);
    for prime in &primes {
        write!(file, "{}, ", prime);
    }
}

Node.js code - Node v12.16.2

let fs = require('fs');
const args = process.argv.slice(2)

let limit = args[0] || 100;
let primes = [2];

let start = Date.now();

for(let i = 3; i <= limit; i++) {
    // check if i has any factors among prior primes
    let factor = -1;
    let sqrt = Math.sqrt(i);

    for(j = 0; j < primes.length; j++) {
        let prime = primes[j]
        let mod = i % prime;

        if(mod < 1) {
            factor = prime;
            break;
        }

        if(prime > sqrt) break;
    }

    if(factor < 0) {
        // new prime found
        primes.push(i);
    }

    if(i%1000000 < 1) console.log('Logging', i);
}

let duration = Date.now() - start;
let string = 'Limit: ' + limit + '\n';
string += 'Found: ' + primes.length + '\n';
string += 'Duration: ' + duration + '\n';

string += JSON.stringify(primes);

fs.writeFileSync(limit + '.txt', string);

Is there anything fundamentally wrong that I am doing in my rust code so that performance is bad?

Thank you for your time :slight_smile:

Did you try it in release mode?

I just compiled it.
rustc generate_primes.rs

That compiles generate_primes.rs in debug mode, so you'll get a binary which is easy to debug, but doesn't have any optimisations applied.

Can you run rustc -O generate_primes.rs and try again?

1 Like

It did the trick - I got to 2.6s .
Thanks :sweat_smile:

PS. for 10^8 I went from 78s in Node to 61s in Rust

3 Likes

Using iterators and for loops in the Rust code would also make it more efficient, as iterators generally optimise well. The variable j appears to only be used to index into primes, so iterating over primes directly would avoid the bounds check when indexing. Using for loops is also more idiomatic as it creates more structured code.

1 Like

This is a slightly optimized version:

use std::env;
use std::time::Instant;
use std::fs::File;
use std::io::Write;

fn main() {
    let args: Vec<String> = env::args().collect();
    let limit: u32 = args[1].parse().unwrap();

    if limit > u32::MAX - 1 { 
        println!("Input limited to u32::MAX - 1 = 4_294_967_294u32");
        return;
    }

    // will grow as needed, but will not need to allocate many times
    let mut primes: Vec<u32> = Vec::with_capacity(100_000);
    primes.push(2);

    let start = Instant::now();
    // use for loop in iterators instead of just using loops, 
    // .for_each() is good too.
    for i in 3..=limit {
        let mut factor: u32 = 0;
        let sqrt: f64 = (i as f64).sqrt();

        let sqrt_u32: u32 = sqrt.ceil() as u32;

        // same like loop above, using iterators is more efficient, accessing elements would require bound checks.
        for prime in primes.iter() {
            let prime = *prime as u32;
            let modulo: u32 = i % prime;

            if modulo == 0 {
                factor = prime;
                break;
            }

            if prime > sqrt_u32 { 
                break; 
            }
        }

        if factor == 0 {
            // new prime found
            primes.push(i);
        }
    }

    let duration = start.elapsed();
    println!("Duration: {:?}, {} primes found", duration, primes.len());
    

    // print
    /*
    for i in 0..=primes.len() - 1 {
        println!("{}", primes[i].to_string());
    }
    */

    // write
    // let file_name: String = format!("{}.txt", limit.to_string());
    // let mut file = File::create(file_name).expect("Unable to create file");                                                          
    // write!(file, "Limit: {}\nFound: {}\nDuration: {:?}\n[", limit.to_string(), primes.len().to_string(), duration);
    // for prime in &primes {
    //     write!(file, "{}, ", prime);
    // }
}

I'm surprised, but my functional-style implementation using iterators performs about the same as the original. I'm guessing the code is simple enough that the optimiser can optimise both implementations down to (roughly) the same machine code.

fn calculate_primes_functional(limit: u32) -> Vec<u32> {
    let mut primes = vec![2];

    for candidate in 3..limit {
        let sqrt_candidate = (candidate as f64).sqrt().ceil() as u32;

        let has_factors = primes
            .iter()
            .copied()
            .take_while(|prime| *prime <= sqrt_candidate)
            .any(|prime| candidate % prime == 0);

        if !has_factors {
            primes.push(candidate);
        }
    }

    primes
}

A tidied up version of @jakasi2's original function:

    let mut primes = vec![2];

    for i in 3..limit {
        let mut factor: u32 = 0;
        let sqrt: f64 = (i as f64).sqrt();

        let sqrt_u32: u32 = sqrt.ceil() as u32;
        let len: u32 = (primes.len() as u32) - 1;

        for j in 0..len {
            let prime: u32 = primes[j as usize];
            let modulo: u32 = i % prime;

            if modulo < 1 {
                factor = prime;
                break;
            }

            if prime > sqrt_u32 {
                break;
            }
        }

        if factor < 1 {
            // new prime found
            primes.push(i);
        }
    }

    primes
}

And running it on my computer:

$ cargo bench
   Compiling primes v0.1.0 (/tmp/primes)
    Finished bench [optimized] target(s) in 0.26s
     Running target/release/deps/primes-1da2e23e7df7b32f

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out

     Running target/release/deps/bench-8f04c133c8165076

running 2 tests
test functional_method ... bench:  33,520,543 ns/iter (+/- 715,806)
test iterative_method  ... bench:  33,861,599 ns/iter (+/- 888,395)

test result: ok. 0 passed; 0 failed; 0 ignored; 2 measured; 0 filtered out

cargo bench  20.72s user 0.10s system 101% cpu 20.579 total

(playground)

1 Like

Like I said, with optimisations turned on Rust generally optimises iterators well to something like the imperative equivalent. Of course, without optimisation they tend to be slower because you are writing a sequence of function calls that aren't inlined or optimised.

You've reintroduced the indexing of primes there. In @naim 's code, iterating over primes removes the bounds check that is introduced by indexing.

The thing is that Rust can often optimize the iterator version better, since it can skip bounds checks and thus auto-vectorize.

So.. we have a collection of some sort that can be iterated upon. .iter() returns the iterator. On each pass the for loop calls .next() on the iterator and the pointer to the next value is set to the variable which is specified (for variable in primes.iter()) . Is this the correct explanation?

What is with indexing and bounds on Vec. I know about linked lists and why iterating is better than indexing - O(n) - but I choose Vec because I red on std::collections - Rust that Vec has get of O(1) and is fine for just pushing things. What bounds are calculated when I want to retrieve a random element, primes[25] for example.

Thanks

Correct, it calls the iterator's next on every iteration, which will give you a reference to the value (&u32).

When you read primes[25], the std::ops::Index implementation of Vec is called, which asserts that 25 is between 0 and self.len(). You can avoid this check by iterating, or calling the unsafe get_unchecked.

1 Like

Continuing on from @naim's comment. Using normal indexing means you do one check at the top of the for-loop to see whether we need to keep going, then a second check when you actually do the indexing.

When you use an iterator it'll call the next() method, which will do a single check to see if we need to keep going (source code), returning a reference to the next element calculated using pointer arithmetic if we haven't reached the end.

Often the compiler is smart enough to identify that we don't need the indexing check if the for-loop check is happy, but this isn't always the case.

2 Likes

Using cargo bench always compiles with optimisations... That's why I was surprised that my functional-style implementation using just iterators and no indexing performed almost identically.

Why are you dereferencing *prime. Isn't the point of .copied() that it returns iterator of T instead of &T?

Anyhow.. nice solution :slight_smile: :+1:

The .take_while() adapter will pass a reference to the next item to the predicate. It doesn't give you the item by value.

The docs for take_while() have an example for exactly this situation:

Because the closure passed to take_while() takes a reference, and many iterators iterate over references, this leads to a possibly confusing situation, where the type of the closure is a double reference:

let a = [-1, 0, 1];

let mut iter = a.iter().take_while(|x| **x < 0); // need two *s! 

assert_eq!(iter.next(), Some(&-1)); 
assert_eq!(iter.next(), None);
1 Like

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.