Rustification, Fast Vec<bool> creation

I'm fairly new to rust, and I'm converting some of my projects from Python to Rust.
This is a prime tester tool I made in python for fun (then used in some actual project)
I did multiple functions to see the improvements of various tricks to speed up (in this case) Sieve of Eratosthenes algorythm.
Until last test, rust followed the improvements I had in Python, can you help me understand what am I doing wrong?
Code is on my github too:

fn sieve_vanilla(n: u64) -> Vec<u64>{
    let mut primes = vec![true; (n+1) as usize];
    
    let mut res: Vec<u64> = Vec::new();
    primes[0] = false;
    primes[1] = false;

    for i in 2..(n+1) as usize {
        if primes[i] {
            res.push(i as u64);
            let mut j = i + i;
            while j <= n as usize {
                primes[j] = false;
                j += i;
            }
        }
    }


    res
}

fn sieve_001(n: u64) -> Vec<u64>{
    let mut primes = vec![true; (n+1) as usize];
    
    let mut res: Vec<u64> = Vec::new();
    primes[0] = false;
    primes[1] = false;

    for i in 2..((n+1) as f64).sqrt() as usize {
        if primes[i] {
            res.push(i as u64);
            let mut j = i + i;
            while j <= n as usize {
                primes[j] = false;
                j += i;
            }
        }
    }
    
    for i in ((n as f64).sqrt() as usize + 1)..(n+1) as usize {
        if primes[i] {
            res.push(i as u64);
        }
    }

    res
}

fn sieve_002(n: u64) -> Vec<u64>{
    let mut primes = vec![true; (n+1) as usize];
    
    let mut res: Vec<u64> = Vec::new();
    primes[0] = false;
    primes[1] = false;

    for i in 2..((n+1) as f64).sqrt() as usize {
        if primes[i] {
            res.push(i as u64);
            let mut j = i * i;
            while j <= n as usize {
                primes[j] = false;
                j += i;
            }
        }
    }
    
    for i in ((n as f64).sqrt() as usize + 1)..(n+1) as usize {
        if primes[i] {
            res.push(i as u64);
        }
    }

    res
}

fn sieve_003(n: u64) -> Vec<u64>{
    let mut primes: Vec<bool> = vec![[false, true]; ((n+1)/2) as usize].into_iter().flatten().collect();
    if (n+1) % 2 != 0 {
        primes.push(false);
    }
    primes[1] = false;
    primes[2] = true;

    let mut res: Vec<u64> = Vec::new();
    res.push(2 as u64);
    
    let limit = ((n as f64).sqrt() as usize) + 1;
    for i in (3..limit).step_by(2) {
        if primes[i] {
            res.push(i as u64);
            let mut j = i * i;
            while j <= n as usize {
                primes[j] = false;
                j += i;
            }
        }
    }
    
    for i in ((n as f64).sqrt() as usize + 1)..(n+1) as usize {
        if primes[i] {
            res.push(i as u64);
        }
    }

    res
}

fn _test_vec_creation(n: u64){
    // test if making a true vec and loop over multiple of 2 is faster than take.flatten.collect

    // method 1
    let start = Instant::now();
    let mut primes = vec![true; (n+1) as usize];
    primes[0] = false;
    primes[1] = false;
    for i in (4..(n+1) as usize).step_by(2) {
        primes[i] = false;
    }
    let duration = start.elapsed();
    println!("Method 1 in {:?}", duration); // Print the elapsed time

    // method 2
    let start = Instant::now();
    let mut primes: Vec<bool> = vec![[false, true]; ((n+1)/2) as usize].into_iter().flatten().collect();
    if (n+1) % 2 != 0 {
        primes.push(false);
    }
    primes[1] = false;
    primes[2] = true;

    let duration = start.elapsed();
    println!("Method 2 in {:?}", duration); // Print the elapsed time

}

fn main() {
    let args = Cli::parse();
    if args.mode == Mode::List{
        let start = Instant::now();
        let _values = sieve_vanilla(args.value);
        let duration = start.elapsed();
        println!("sieve_vanilla Done in {:?}", duration); // Print the elapsed time

        let start = Instant::now();
        let _values = sieve_001(args.value);
        let duration = start.elapsed();
        println!("001 Done in {:?}", duration); // Print the elapsed time

        let start = Instant::now();
        let _values = sieve_002(args.value);
        let duration = start.elapsed();
        println!("002 Done in {:?}", duration); // Print the elapsed time

        let start = Instant::now();
        let _values = sieve_003(args.value);
        let duration = start.elapsed();
        println!("003 Done in {:?}", duration); // Print the elapsed time

    }

}

With 25milion as input I have:

sieve_vanilla Done in 807.8685ms
001 Done in 643.6626ms
002 Done in 611.4306ms
003 Done in 991.5997ms

The 003 is slower than 002, when essentially I improved a lot of things there:

  1. My Vec of primes generation is halved thanks to
    vec![[false, true]; ((n+1)/2) as usize].into_iter().flatten().collect();
  2. The foor loop starts at 3 and has step of 2, halvening the value to check

Point 1 is correct in Python but it is essentially slower in rust! The function

fn _test_vec_creation(n: u64){
    // test if making a true vec and loop over multiple of 2 is faster than take.flatten.collect

    // method 1
    let start = Instant::now();
    let mut primes = vec![true; (n+1) as usize];
    primes[0] = false;
    primes[1] = false;
    for i in (4..(n+1) as usize).step_by(2) {
        primes[i] = false;
    }
    let duration = start.elapsed();
    println!("Method 1 in {:?}", duration); // Print the elapsed time

    // method 2
    let start = Instant::now();
    let mut primes: Vec<bool> = vec![[false, true]; ((n+1)/2) as usize].into_iter().flatten().collect();
    if (n+1) % 2 != 0 {
        primes.push(false);
    }
    primes[1] = false;
    primes[2] = true;

    let duration = start.elapsed();
    println!("Method 2 in {:?}", duration); // Print the elapsed time

}

Serve just this purpose to test the two Vec creation methods, with n = 25milion the results:

Method 1 in 172.89ms
Method 2 in 553.5054ms

Instead in python the improvement in method 3 was awesome:

Vanilla:  0.21936464309692383
1:  0.11927986145019531
2:  0.10861492156982422
3:  0.0701291561126709

Can you help me understand if it just different in Rust or if my flatten implementation is slowing everything and there is a Rustic way of doing it?
Thanks

Your second method allocates two vectors vec![...] will allocate the first vector and collect() will allocate the second one.

std::iter::repeat([false, true]).take(((n+1)/2) as usize).flatten().collect() should work.

Generally, micro optimizations that work in python do not work as well in rust because at some level the bottlenecks are different.

3 Likes

I don't think it's going to change the relative performance in this case, but here is a mandatory reminder:

Are you running with --release?

2 Likes

I really have a lot to read!

Thanks for your time!

uhmmmm no,

cargo run -- 2500000

why?

Because the default profile is opt-level = 0 whereas --release is opt-level = 3.

3 Likes

this actually open a lot of topics to study. rust I so complex! thanks a lot!

Because for things that aren't IO-limited, it's common for release mode to be 20-50 times faster :slight_smile:

6 Likes

Instead of a Vec<bool>, which uses one byte per bool, have you considered using a bitvec type that'll use each bit to represent the true/false values?

That should make your vector about 8x smaller, which might have increase performance if the improved cache locality makes up for the extra instructions needed to access a particular bit.

1 Like

Algorithmic feedback...

Here's another version which takes about half the time of the other variations on my laptop. The idea is that you only store booleans for odd numbers (so sieve[i] corresponds to n = 2*i + 1).

Sorry for the lack of comments.

fn sieve_004(n: u64) -> Vec<u64> {
    let limit = n as usize;
    let sz = limit / 2;
    let mut sieve = vec![true; sz + 1];
    sieve[0] = false;
    for j in 1..(sz+1) {
        if sieve[j] {
            let mut k = 2 * j * (1 + j);
            while k < sz {
                sieve[k] = false;
                k += 2 * j + 1;
            }
        }
    }

    [2].iter().cloned().chain(
        (&sieve[1..sz]).into_iter().enumerate().filter_map(
            |(i, &b)| if b { Some(2 * i as u64 + 3) } else { None }
        )
    ).collect()
}
1 Like

Hi,
Thanks for the variation, that's indeed one of the improvements in my Python implementation.
I didn't consider yet how to "rustify", so this is mostply apreciated!
I will have to watch closely what that [2].iter().cloned().chain() wizardry does though!
Thanks

This is a very nice data structure, I will surely try it.
Thanks!

It should really be [2].into_iter().chain(...) (old code). Or once(2).chain(...).

You can read more about chain here. But the basic idea is to create an iterator of "2, then all the odd primes".

I haven't done any recent testing of collecting the iterator verus pushing in the loop. But another consideration (you probably are aware of) is that you can use a good upper bound on the number of primes below N to pre-allocate the Vec<u64>, for push or extend approaches.