Prime Pairs code for Goldbach's Conjecture

The first four and a half pages are very clearly correct. Then the last three paragraphs in page 5 are meaningless to me. Since the next page starts a new section, presumably those three paragraphs somehow prove the conjecture.

Would really like to have a trully fast Rust version.

In which case I'm not clear why you did not apply the change you noted (and I confirmed) as significant:

D|Crystal use arrays for lhr_del, not a HashSet, which I guess could be the bottleneck.

But I redid the work with the code you posted last, here are the versions:

  1. Remove the HashSet in favor of a (sorted) Vec and a binary search: Rust Playground
  2. Minimize what we printout and remove the retain loop (not sure if that is valid, but you did not answer): Rust Playground
  3. Finally, replace usize by u32: Rust Playground

Those are the run times, taken from my computer unscientifically:

Variant \ n 100_000_000 150_000_000 700_000_000
Base 10.7s 8.3s 70.0s
1. 4.6s 4.3s 31.5s
2. 3.9s 3.6s 26.7s
3. 3.5s 3.4s 23.6s

The HashSet -> Vec change seems to make up what your base version is missing in comparison to the other implementations, the other stuff is more micro, but going from 31.5s to 23.6s is nothing to sneeze at, of course.

2 Likes

Thanks you @KillTheMule.
I used your code suggestions to make a u32 and u64 version (like I did for D).

Here are the two versions.

use std::time::Instant;

fn coprime(mut m: u32, mut n: u32) -> bool {
  while m|1 != 1 { let t = m; m = n % m; n = t }
  m > 0
}

fn prime_pairs_lohi(n : u32) {               // for u32 input values
  if n&1 == 1 || n < 4 { return println!("Input not even n > 2"); }
  if n <= 6 { return println!("[{}, {}]\n[{}, {}]\n[{}, {}]",n,1,n/2,n/2,n/2,n/2); };

  // generate the low-half-residues (lhr) r < n/2
  let (ndiv2, rhi) = (n/2, n-2);             // lhr:hhr midpoint, max residue limit
  let mut lhr: Vec<u32> = (3..ndiv2).step_by(2).filter(|&r| coprime(r, n)).collect();

  // identify and store all powers and cross-products of the lhr members < n-2
  let mut lhr_del = Vec::with_capacity(lhr.len() as usize); // lhr multiples, not part of a pcp
  for i in 1..lhr.len()-1 {                  // iterate thru lhr to find prime multiples
    let (mut j, r) = (i, lhr[i-1]);          // for current residue
    let rmax = rhi / r;                      // ri can't multiply r with values > this
    if r < rmax { lhr_del.push(if r*r < ndiv2 {r*r} else {n - r*r} ); } // for r^2 multiples
    if lhr[i] > rmax { break }               // exit if product of consecutive r’s > n-2
    while lhr[j] <= rmax {                   // stop for r if cross-product with ri > n-2
      lhr_del.push(if r*lhr[j] < ndiv2 {r*lhr[j]} else {n - r*lhr[j]}); // store value if < n-2
      j += 1;                                // get next lhr value
  } }

  lhr_del.sort_unstable();                   // remove from lhr its lhr mults, pcp remain
  lhr.retain(|&m| !lhr_del.binary_search(&m).is_ok());
  let lcnt = lhr.len();                      // number of pcp prime pairs
  println!("[{}, {}]", n, lcnt);             // show n and pcp prime pairs count
  println!("[{}, {}]", lhr[0],n-lhr[0]);     // show first pcp prime pair of n
  println!("[{}, {}]", lhr[lcnt-1], n-lhr[lcnt-1]); // show last  pcp prime pair of n
}

fn main() {
  let n: u32 = std::env::args()
    .nth(1).expect("missing count argument")
    .replace('_', "").parse().expect("one input");

  let start = Instant::now();
  prime_pairs_lohi(n);
  println!("{:?}", start.elapsed());
}

use std::time::Instant;

fn coprime(mut m: usize, mut n: usize) -> bool {
  while m|1 != 1 { let t = m; m = n % m; n = t }
  m > 0
}
-----------------------------------------------------------------------------------

fn prime_pairs_lohi(n : usize) {             // for u64 input values
  if n&1 == 1 || n < 4 { return println!("Input not even n > 2"); }
  if n <= 6 { return println!("[{}, {}]\n[{}, {}]\n[{}, {}]",n,1,n/2,n/2,n/2,n/2); };

  // generate the low-half-residues (lhr) r < n/2
  let (ndiv2, rhi) = (n/2, n-2);             // lhr:hhr midpoint, max residue limit
  let mut lhr: Vec<usize> = (3..ndiv2).step_by(2).filter(|&r| coprime(r, n)).collect();

  // identify and store all powers and cross-products of the lhr members < n-2
  let mut lhr_del = Vec::with_capacity(lhr.len() as usize); // lhr multiples, not part of a pcp
  for i in 1..lhr.len()-1 {                  // iterate thru lhr to find prime multiples
    let (mut j, r) = (i, lhr[i-1]);          // for current residue
    let rmax = rhi / r;                      // ri can't multiply r with values > this
    if r < rmax { lhr_del.push(if r*r < ndiv2 {r*r} else {n - r*r} ); } // for r^2 multiples
    if lhr[i] > rmax { break }               // exit if product of consecutive r’s > n-2
    while lhr[j] <= rmax {                   // stop for r if cross-product with ri > n-2
      lhr_del.push(if r*lhr[j] < ndiv2 {r*lhr[j]} else {n - r*lhr[j]}); // store value if < n-2
      j += 1;                                // get next lhr value
  } }

  lhr_del.sort_unstable();                   // remove from lhr its lhr mults, pcp remain
  lhr.retain(|&m| !lhr_del.binary_search(&m).is_ok());
  let lcnt = lhr.len();                      // number of pcp prime pairs
  println!("[{}, {}]", n, lcnt);             // show n and pcp prime pairs count
  println!("[{}, {}]", lhr[0],n-lhr[0]);     // show first pcp prime pair of n
  println!("[{}, {}]", lhr[lcnt-1], n-lhr[lcnt-1]); // show last  pcp prime pair of n
}

fn main() {
  let n: usize = std::env::args()
    .nth(1).expect("missing count argument")
    .replace('_', "").parse().expect("one input");

  let start = Instant::now();
  prime_pairs_lohi(n);
  println!("{:?}", start.elapsed());
}

Both versions are now the fastest of all the languages I've done so far.
The frustrating thing to me, though, is I was so close, but conceptually so far from
knowing the best way to do this. All my searching online never brought me to this:

lhr_del.sort_unstable();
lhr.retain(|&m| !lhr_del.binary_search(&m).is_ok());

This is conceptually so obscure to me (compared to the other languages) it almost seems like sorcery.
After studying it I now understand it, but could never have first thought of it by myself.
I just don't know (seen) enough Rust to think like this in it, though it's conceptually like Python.

On my 16GB laptop the u32 and u64 were able to do n = 4_200_000_000.

➜  ./prime_pairs_lohi_rs32 4_200_000_000
[4200000000, 19883421]
[71, 4199999929]
[2099999989, 2100000011]
92.162005197s

➜  ../prime_pairs_lohi_rs64  4_200_000_000
[4200000000, 19883421]
[71, 4199999929]
[2099999989, 2100000011]
154.718689352s

Here are some times comparisons between the u32 and u64 Rust versions.

-------------------------------------------
       n      |     u32    |      u64
------------_-|------------|---------------
  100,000,000 |    3.35    |     3.57
--------------|------------|---------------
  200,000,000 |    7.77    |     7.75
------- ------|------------|---------------
  300,000,000 |    6.55    |     6.99
--------------|------------|---------------
  400,000,000 |   14.97    |    16.38
---- ---------|------------|---------------
  500,000,000 |   18.47    |    20.47
--------------|------------|---------------
  600,000,000 |   13.47    |    14.53
---- ---------|------------|---------------
  700,000,000 |   21.72    |    23.79
--------------|------------|---------------
  800,000,000 |   30.97    |    34.27
---- ---------|------------|---------------
  900,000,000 |   22.95    |    25.35
--------------|------------|---------------
1,000,000,000 |   40.15    |    45.22

Well, I'm not really a programmer myself (being a mathematician by trade, although with basically zero experience in number theory), but what kind of tipped me off here was the D implementation (although I know very little of D) where lhr_del would ned to be sorted specifically (and now you made me ponder on how to utilize the fact that lhr is sorted ...). I had an experience with this in the past where I had a huge data structure keeping (usize, String)s that I needed to index into where I experimented with HashSet, Vec where the usize would be the index (blowing up memory usage by having needless entries) and Vec keeping (usize, String) as sorted values and binary searching in it. And binary searching easily won out here, even though the number of entries was quite high...

Great that we could speed up the implementation :slight_smile:

You don't normally need to try different approaches like this in Rust, it comes up mainly when trying to optimize as much as possible. Normally you can use roughly the same sorts of algorithms and data structures as in other languages, with some notable exceptions such as linked lists and self-referential data structures in general.

I want to thank everybody who contributed to this thread.
For me, this is the best way to learn real Rust programming.

2 Likes

So, I got interested and looked at how the D standard library computes the set difference. Lo and behold, it does use the fact that both sets are sorted! I implemented this, and it does indeed give a further speedup of around 10%, although we have to make an additional allocation (I'm pretty sure that could be avoided with some gymnastics, but I don't want to do that, and I'm somewhat convinced it doesn't really matter). Code is here: Rust Playground

1 Like

I suggested using BTreeSet instead earlier for this reason, I'm curious if the higher insert cost beats the allocate and sort.

That seems to be pretty hard. Converting lhr into a BTreeSet just for the difference (and making lhr_del a BTreeSet to begin with) actually slows down the whole thing tremendously. I don't see how to make lhr: BTreeSet<usize> from the start, since we need slice indexing per the algorithm, sooo... Here's my try, if you want to have a look: Rust Playground

I guess I'm not too surprised; BTree is O(log n) and Hash is O(1) for insertion, but I was guessing the cache locality of BTree difference might beat it out. It looks like difference is not a particularly naive implementation either...


Double checked the code, you should be able to use first() and last() instead of converting back to a Vec?

Excellent implementation of SetDiff @KillTheMule. Kudos to you!

This is the u32 version using your SetDiff.
For the u64 version just change all the u32 references to usize with a text editor.

use std::time::Instant;

fn coprime(mut m: u32, mut n: u32) -> bool {
  while m|1 != 1 { let t = m; m = n % m; n = t }
  m > 0
}

struct SetDiff<'a> {
  r1: &'a [u32],
  r2: &'a [u32],
}

impl<'a> SetDiff<'a> {
  fn adjust_pos(&mut self) {
    while !self.r1.is_empty() {
      if self.r2.is_empty() || self.r1[0] < self.r2[0] {
        break;
      } else if self.r2[0] < self.r1 [0] {
        self.r2 = &self.r2[1..];
      } else {
        self.r1 = &self.r1[1..];
        self.r2 = &self.r2[1..];
  } } }

  fn new(r1: &'a [u32], r2: &'a [u32]) -> Self {
    let mut s = SetDiff{ r1, r2 };
    s.adjust_pos();
    s
} }

impl<'a> Iterator for SetDiff<'a> {
  type Item = u32;

  fn next(&mut self) -> Option<Self::Item> {
    let val = self.r1.get(0).copied();
    if val.is_some() {
      self.r1 = &self.r1[1..];
    }
    self.adjust_pos();
    return val
} }

fn prime_pairs_lohi(n : u32) {               // for u32 input values
  if n&1 == 1 || n < 4 { return println!("Input not even n > 2"); }
  if n <= 6 { return println!("[{}, {}]\n[{}, {}]\n[{}, {}]",n,1,n/2,n/2,n/2,n/2); };

  // generate the low-half-residues (lhr) r < n/2
  let (ndiv2, rhi) = (n/2, n-2);             // lhr:hhr midpoint, max residue limit
  let mut lhr: Vec<u32> = (3..ndiv2).step_by(2).filter(|&r| coprime(r, n)).collect();

  // identify and store all powers and cross-products of the lhr members < n-2
  let mut lhr_del = Vec::with_capacity(lhr.len() as usize); // lhr multiples, not part of a pcp
  for i in 1..lhr.len()-1 {                  // iterate thru lhr to find prime multiples
    let (mut j, r) = (i, lhr[i-1]);          // for current residue
    let rmax = rhi / r;                      // ri can't multiply r with values > this
    if r < rmax { lhr_del.push(if r*r < ndiv2 {r*r} else {n - r*r} ); } // for r^2 multiples
    if lhr[i] > rmax { break }               // exit if product of consecutive r’s > n-2
    while lhr[j] <= rmax {                   // stop for r if cross-product with ri > n-2
      lhr_del.push(if r*lhr[j] < ndiv2 {r*lhr[j]} else {n - r*lhr[j]}); // store value if < n-2
      j += 1;                                // get next lhr value
  } }

  lhr_del.sort_unstable();                   // remove from lhr its lhr mults, pcp remain
  let lhr: Vec<u32> = SetDiff::new(&lhr, &lhr_del).collect();
  let lcnt = lhr.len();                      // number of pcp prime pairs
  println!("[{}, {}]", n, lcnt);             // show n and pcp prime pairs count
  println!("[{}, {}]", lhr[0],n-lhr[0]);     // show first pcp prime pair of n
  println!("[{}, {}]", lhr[lcnt-1], n-lhr[lcnt-1]); // show last  pcp prime pair of n
}

fn main() {
  let n: u32 = std::env::args()
    .nth(1).expect("missing count argument")
    .replace('_', "").parse().expect("one input");

  let start = Instant::now();
  prime_pairs_lohi(n);
  println!("{:?}", start.elapsed());
}

I did new time comparisions on a quiet system.
I rebooted and only opened a terminal, a text editor, and directory manager.
Using SetDiff is substantially faster, and even a teensy bit more memory efficient, monitoring with htop.

----------------------------------------------------
       n      |   u32  | setdif |   u64  | setdif
------------_-|--------|--------|--------|----------
  100,000,000 |  3.35  |  2.96  |  3.52  |  3.19
--------------|--------|------- |--------|----------
  200,000,000 |  7.77  |  6.19  |  7.75  |  6.58
------- ------|--------|--------|--------|----------
  300,000,000 |  6.40  |  5.73  |  6.91  |  6.06
--------------|--------|--------|--------|----------
  400,000,000 | 14.44  | 12.80  | 16.01  | 14.09
---- ---------|--------|--------|--------|----------
  500,000,000 | 18.44  | 16.32  | 20.47  | 18.25
--------------|--------|--------|--------|----------
  600,000,000 | 13.47  | 12.05  | 14.53  | 12.79
---- ---------|--------|--------|--------|----------
  700,000,000 | 21.72  | 19.51  | 23.40  | 21.22
--------------|--------|--------|--------|----------
  800,000,000 | 30.97  | 27.51  | 34.00  | 29.92
---- ---------|--------|--------|--------|----------
  900,000,000 | 22.95  | 18.30  | 25.35  | 19.42
--------------|--------|--------|--------|----------
1,000,000,000 | 38.99  | 34.81  | 43.26  | 38.56

This SetDiff is a great contribution to Rust, and I think should be in the std library.
Eliminating unwanted elements in a data set is a frequent|important operation in machine learning, statistics, data analytics, etc. Even at least a crate would be nice to have.

Thank you again @KillTheMule.

Well, I did ask if that's a feasible optimization, but I did not get an answer, so I assumed the full range is needed, and first/last only selected for the printout.

You're welcome, but note that I did steal it all from the D standard library, I just ported it to rust :wink:

Stealing is good, if you know the right thing to steal. :smile:

But you still had the insight to do the (good) translation to Rust, and deserve credit for that.
After all, this is what Open Source Software is about. Using the knowledge of others to advance technology!

You can iterate for that, if needed. Not that I think it's going to tip the scales much if it is "tremendously slower", converting back to a Vec should be close to a memcpy! Just thought I'd mention it since it gives a closer comparison of BTree vs other implementations.

I know the Rust devs hate putting things in std they feel is not absolutely necessary, but addressing the last element of an array|vec is so ubiquitous, it would sure make Rust more human centered to be able to do something like lhr.last() or lhr[-1] or lhr[$].

1 Like

All of this doesn't have anything to do with Rust, really. This is data structures and algorithms 101:

  • O(1) is faster than O(log n) is faster than O(n), for large enough inputs
  • a flat array has better cache locality than a linked structure
  • binary search on pre-sorted input is therefore often a good compromise.

Not to belittle the implementation, but this is again just a standard algorithm: the merge part of merge-sort. It's not sorcery, it's been known for 80 years.

Both array and Vec deref to slices, which have first and last methods, along with pretty much every other method that doesn't update the list itself.

1 Like

It sure sounds like you're dismissing it, and you're not acknowledging the bigger point.

@KillTheMule took the time to wonder if D using its SetDiff c|would be a better operation in the given code. He took the interest|time to implement|test it to get it to work. The empirical evidence shows it's much faster, with no memory use negatives.

It doesn't matter if you can go to The Art of Computer Programming, et al, and find algorithms to do it. This function didn't exist in Rust (to my knowledge) before, but now it does. If it can be improved, great!

It jdisturbs me when people take time to do something positive and are met with negative dismissal.

Not sure I catch you. Difference is just an iterator, so to get the last, you need to iterate through it. You might save a bit on allocation if you really need only first and last, but that's what I figured isn't true, so I opted for "saving all results for further use" in all implementations. But yeah, the difference goes like 5s -> 25s, so any single allocation won't make a dent :slight_smile:

Not to make to fine a point of it, but I don't think this sentence is really parseable for a rust beginner. I'd probably just say "lhr.last() and lhr.first() do exist" :wink:

1 Like

The only reason you might not think they do is that you looked at the list of methods available, knowing that deref exists is the missing piece in that case. You need to know to keep scrolling to "Methods from Deref<Target=[T]>"


Dang it, I saw the .last() method and assumed it was a DoubleEndedIterator - it should probably mention it's O(n) but it is on me.

1 Like