Forgetting about rust for a moment, let’s talk prime number algorithms.

If you really want to make it fast, a Sieve of Eratosthenes is even faster.

The premise is simple: You have a big boolean array. Each time you find a number that’s not crossed off, you cross off all of its multiples.

- 2 is not crossed off, so cross off 4, 6, 8, 10, 12, 14, 16, 18, 20…
- 3 is not crossed off, so cross off 6, 9, 12, 15, 18, 21, 24, 27…
- 5 is not crossed off, so cross off 10, 15, 20, 25, 30, 35, 40, 45…

Notably, each time you encounter a new prime `p`

, you’ve already crossed off all multiples smaller than `p*p`

, so you can start from that number:

- 2 is not crossed off, so cross off 4, 6, 8, 10, 12, 14, 16, 18, 20…
- 3 is not crossed off, so cross off 9, 12, 15, 18, 21, 24, 27…
- 5 is not crossed off, so cross off 25, 30, 35, 40, 45…

This algorithm performs integer multiplication and addition where yours performed integer modulus, and I believe it has a lower big-O complexity as well.

```
fn sieve_of_eratosthenes(limit: i64) -> Vec<bool> {
let mut sieve: Vec<_> = vec![true; limit as usize];
sieve[0] = false;
sieve[1] = false;
for x in 2..limit {
let first_new_multiple = x*x;
if first_new_multiple > limit { break }
if sieve[x as usize] {
for m in (first_new_multiple..limit).step_by(x as usize) {
sieve[m as usize] = false;
}
}
}
sieve
}
```

This can find all primes up to `10^8`

in 1.183s on my machine.

There’s a number of ways you can improve upon this, which I’ll leave as an exercise to the reader:

- Optimizing for multiples of 2 and maybe 3 so that you can cross off fewer repeat numbers, similar to what you did above. (This is called a
**wheel optimization**).
- Similar to the above, but compress the representation of the list so that it doesn’t even contain data for multiples of 2 or 3. Return a new type
`struct Sieve(Vec<bool>)`

with methods that check the right element for a given number. (or maybe produce a list of primes from the compressed mask and return that)
- Turn it into a
**factor sieve.** Basically, change the output to a `Vec<i64>`

, where `sieve[n]`

returns the smallest factor of `n`

for `n > 1`

(and so primes are thus numbers where `sieve[n] == n`

). This is extremely useful if you need to compute the factorization of many numbers.
- I
**would not** bother trying to parallelize it. The algorithm is inherently serial.

For large enough limits, you’ll find that CPU cache misses begin to almost entirely dominate the cost of the algorithm. After all, every new prime requires it to scan through a significant portion of the list. So to push that time even lower, you’ll need to change the order of iteration to be cache-friendly.

This last upgrade will be far more difficult to implement than the others previously mentioned. It’s called the “segmented sieve,” and you can read about it here: https://primesieve.org/segmented_sieve.html