Numerical casting errors in 32-bit Linux compiler

jzakiya · March 17, 2016, 5:41pm

I have translated C++ code I wrote to Rust to do a Segmented Sieve of Zakiya (SSoZ).
(If you're interested -- The Segmented Sieve of Zakiya (SSoZ) | PDF | Prime Number | Central Processing Unit)

The Rust translation works perfectly on 64-bit Linux systems (over the number ranges I've tested so far). However, on 32-bit systems, the C++ works too but the same Rust code produces array index errors, which I suspect occurs due to casting errors converting from u64 values to usize values.

I'm pretty sure the problem is exclusively in the following function.

fn segsieve(kn: usize,ki: u64,seg: &mut Vec<u8>,next: &mut Vec<usize>,primes: &Vec<usize>,pcnt: usize) -> usize {
                                      // for Kn resgroups in segment
  for b in 0..kn {seg[b] = 0;}        // set every byte bit to prime (0)
  
  for r in 0..RESCNT {                // for each ith (of 8) residues for P5
    let biti = 1 << r;                // set the ith residue track bit mask
    let row = r * pcnt;               // set address to ith row in next[]
    for j in 0..pcnt {                // for each prime <= sqrt(N) for restrack
      if (next[row+j] as u64) < (ki + (kn as u64)) {
        let prime = primes[j];
        let mut k = next[row+j];
        if (k as u64) < ki {
          k = ((ki - (k as u64)) % (prime as u64)) as usize;
          if k != 0 {k = prime - k;}
        } else {k -= ki as usize;}
        while k < kn {                // for each primenth byte < segment size
          seg[k] |= biti;             // set ith residue in byte as nonprime
          k += prime;
        }
      }
    }
  }
  // count the primes ('0' bits) in the segment for the Kn resgroup bytes
  let mut primecnt: usize = 0;
  for s in 0..kn {primecnt += PBITS[seg[s] as usize] as usize}
  primecnt
}

The specific problems seems to be this section.

        if (k as u64) < ki {
          k = ((ki - (k as u64)) % (prime as u64)) as usize;
          if k != 0 {k = prime - k;}
        } else {k -= ki as usize;}
        while k < kn {                // for each primenth byte < segment size
          seg[k] |= biti;             // set ith residue in byte as nonprime
          k += prime;
        }

For some reason, as ki hits a certain size, and when k = 0 for some prime values, the compiler throws this panic:

thread '<main>' panicked at 'index out of bounds: the len is 262144 but the index is 4294967295', ../src/libcollections/vec.rs:1134

This is saying that the u8 array seg[k], which has a length of 262144 bytes, is being indexed with a value of k = 2**32 - 1 (or all bits are 1) . The math of the routine doesn't come close to allowing that to happen, but even when I mask off the k value like k & 0x7ffffffff the error persists. So it seems k is being set sometimes to all 1s instead of 0 under certain conditions.

I've tried every trick I can think of to get around this problem (in Rust 1.6 and 1.7). I have even implemented another version to try to lesson indexing conflicts (which works on 64-bit distros too), which only causes the problem to show at different input values. I, thus, have concluded it must be an error in the 32-bit compiler, since it shows up the same on different 32-bit Linux systems I run in Virtualbox (VB).

jzakiya · March 18, 2016, 3:59pm

OK, here's the whole code you can run and see the problem.

I set the segment size Kn = 262144 bytes (L2 size of I5 cpu), which corresponds to that same number of residue groups (resgroups). With 32-bit compiler, if the input value requires more than 262144 resgroups (in other words we use multiple segments to sieve through) it throws a index panic in segsieve.

When val = 2097150, it fits completely in one segment (262144 resgroups).
It runs correctly (total primes = 155611, last prime = 2097143).

When val = 2097160, this requires 262145 resgroups, or now 2 segment iterations, and bombs.
Again, this same code runs fine on 64-bit systems.

i have another version with a slightly different architecture, but it too experiences the same error starting at bigger input values (when the number of resgroups exceed a value of ~2**32).

I suspect this has to be a casting conversion error when translating u64 values into usize values.

Code for a serial Segmented Sieve of Zakiya (SSoZ) using P5 prime generator.

use std::io;
use std::io::prelude::*;

static PBITS: [u8; 256] = [
 8,7,7,6,7,6,6,5,7,6,6,5,6,5,5,4,7,6,6,5,6,5,5,4,6,5,5,4,5,4,4,3,
 7,6,6,5,6,5,5,4,6,5,5,4,5,4,4,3,6,5,5,4,5,4,4,3,5,4,4,3,4,3,3,2,
 7,6,6,5,6,5,5,4,6,5,5,4,5,4,4,3,6,5,5,4,5,4,4,3,5,4,4,3,4,3,3,2,
 6,5,5,4,5,4,4,3,5,4,4,3,4,3,3,2,5,4,4,3,4,3,3,2,4,3,3,2,3,2,2,1,
 7,6,6,5,6,5,5,4,6,5,5,4,5,4,4,3,6,5,5,4,5,4,4,3,5,4,4,3,4,3,3,2,
 6,5,5,4,5,4,4,3,5,4,4,3,4,3,3,2,5,4,4,3,4,3,3,2,4,3,3,2,3,2,2,1,
 6,5,5,4,5,4,4,3,5,4,4,3,4,3,3,2,5,4,4,3,4,3,3,2,4,3,3,2,3,2,2,1,
 5,4,4,3,4,3,3,2,4,3,3,2,3,2,2,1,4,3,3,2,3,2,2,1,3,2,2,1,2,1,1,0];

// P5 prime generator parameters
static RESIDUES: [usize; 9] = [1, 7, 11, 13, 17, 19, 23, 29, 31];
const RMOD:   usize = 30; // prime generator mod value
const RESCNT: usize = 8;  // number of residues for prime generator

// P7 prime generator based sieve to generate primes <= sqrt(num)
fn sozp7(val: usize) -> (Vec<usize>, usize) {
    let res = [1, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67,
              71, 73, 79, 83, 89, 97,101,103,107,109,113,121,127,131,137,139,
             143,149,151,157,163,167,169,173,179,181,187,191,193,197,199,209,211];

    let md = 210;
    let resnum = 48;

    let mut posn = [0; 210];
    for i in 0..resnum {posn[res[i]] = i-1;};

    let mut modk; let mut r; let mut k;

    let num = val-1 | 1;           // if value even number then subtract 1
    k = num/md; modk = md*k; r=1;  // kth residue group, base num value
    while num >= modk+res[r] {r += 1;} // find last pc position <= num
    let maxpcs  = k*resnum + r-1;      // maximum number of prime candidates
    let mut prms = vec![0u8; maxpcs];  // pc tables array for pcs upto num

    let sqrt_n = (num as f32).sqrt() as usize;

    modk=0; r=0; k=0;

    // sieve to eliminate nonprimes from small primes prms array
    for i in 0.. {
      r += 1; if r > resnum {r = 1; modk += md; k += 1;};
      if  prms[i] == 1 {continue;}
      let prm_r = res[r];
      let prime = modk + prm_r;
      if  prime > sqrt_n {break;}
      let prmstep = prime * resnum;
      let js = res[1..resnum+1].iter()
         .map(|ri| (k*(prime + ri) + (prm_r*ri)/md)*resnum + posn[(prm_r*ri) % md]);
      for i in js {let mut j = i; while j < maxpcs {prms[j] = 1; j += prmstep;}}
    }
    // the prms array now has all the positions for primes r1..N
    // extract prime numbers and count from prms into prims array
    let mut primes = vec![7];   // r1 prime for P5
    let mut pcnt = 1;
    modk=0; r=0;
    for i in 0..maxpcs {
      r += 1; if r > resnum {r = 1; modk += md;};
      if prms[i] == 0 {primes.push(modk + res[r]); pcnt += 1;}
    }
    (primes, pcnt)              // small primes array, and its count
}

fn nextinit(next: &mut Vec<usize>, primes: &Vec<usize>, pcnt: usize) {

  let mut pos = [0; RMOD];
  for i in 1..RESCNT {pos[RESIDUES[i]] = i-1}; pos[1] = RESCNT - 1;

  // for each prime store resgroup on each restrack for prime*(modk+ri)
  for j in 0..pcnt {                  // for the pcnt primes r1..sqrt(N)
    let prime = primes[j];            // for each prime
    let k: usize = (prime-2)/RMOD;    // find the resgroup it's in
    let r: usize = (prime-2)%RMOD + 2;// its base residue value
    for ri in &RESIDUES[1..RESCNT+1] {// for each residue value
      let prod = r * ri;              // create residues cross-product r*ri
      let row  = pos[prod % RMOD];    // find residue track its on
      next[row*pcnt + j] = k*(prime + ri) + (prod-2)/RMOD;
    }
  }
}

fn segsieve(kn: usize,ki: u64,seg: &mut Vec<u8>,next: &mut Vec<usize>,primes: &Vec<usize>,pcnt: usize) -> usize {
                                      // for Kn resgroups in segment
  for b in 0..kn {seg[b] = 0;}        // set every byte bit to prime (0)

  for r in 0..RESCNT {                // for each ith (of 8) residues for P5
    let biti = 1 << r;                // set the ith residue track bit mask
    let row = r * pcnt;               // set address to ith row in next[]
    for j in 0..pcnt {                // for each prime <= sqrt(num) for restrack
      if (next[row+j] as u64) < (ki + (kn as u64)) { //if first multiple < segment end
        let prime = primes[j];        // for this prime <= sqrt(num)
        let mut k = next[row+j];      // and its first prime multiple for this restrack
        if (k as u64) < ki {          // if its < start of segment value
          k = ((ki - (k as u64)) % (prime as u64)) as usize; // jump it into segment
          if k != 0 {k = prime - k;}
        } else {k -= ki as usize;}
        while k < kn {seg[k] |= biti; k += prime'} // sieve each primenth byte < segment size
      }
    }
  }
  // count the primes ('0' bits) in the segment for the Kn resgroup bytes
  let mut primecnt: usize = 0;
  for s in 0..kn {primecnt += PBITS[seg[s] as usize] as usize};
  primecnt
}
  
fn printprms(kn: usize, ki: u64, seg: &[u8]) {
  // Extract and print the primes in each segment:
  for k in 0..kn {                    // for Kn residues groups|bytes
    for r in 0..8 {                   // for each residue|bit in a byte
      if (seg[k] as usize) & (1 << r) == 0 {     // if it's '0' it's a prime
        print!("{} ",(ki+k as u64)*(RMOD as u64) + RESIDUES[r+1] as u64);
      }
    }
  }
  println!(""); 
}

#[allow(non_snake_case)]
fn main() {

  print!("Enter number value: ");

  io::stdout().flush().unwrap();
  let mut input = String::new();
  io::stdin().read_line(&mut input).ok().expect("failed to read line");
  let val: u64 = input.trim().parse().expect("Please type a number!");

  //let val: u64 = 128_861_000_000;       pcnt = 30678; resgroups = 4295366667
  //let val: u64 = 2_097_150; //128_861_000_000;         // find primes <= val (7..2^64-1)

  println!("for val = {}",val);

  const B: usize = 262144;             // L2D_CACHE_SIZE 256*1024 bytes, I5 cpu
  let KB = B;                          // number of segment resgroups
  let mut seg = vec![0u8; B];          // create segment array of B bytesize

  println!("segment has {} bytes and {} residues groups", B, KB);

  let num: u64 = val-1 | 1;            // if val even subtract 1
  let k: u64 = num/(RMOD as u64);      // resgroup for num
  let mut modk: u64 = k*(RMOD as u64); // resgroup base value for num
  let mut r = 1;
  while num >= modk+(RESIDUES[r] as u64) {r += 1;}   // find last pc position <= num
  let maxpcs: u64 = k*(RESCNT as u64) + (r-1) as u64;// maximum number of prime candidates
  let Kmax: u64 = (num-2)/(RESCNT as u64) + 1;       // maximum number of resgroups for val

  println!("prime candidates = {}; resgroups = {}", maxpcs, Kmax);

  let sqrt_n = (num as f64).sqrt() as usize;
  let (primes, pcnt) = sozp7(sqrt_n);  // get pcnt and primes <= sqrt(nun)

  println!("create next[{}x{}] array",RESCNT,pcnt);

  let mut next = vec![0; RESCNT*pcnt]; // create array of first prime multiple locations
  nextinit(&mut next, &primes, pcnt);  // load first multiples resgroups for each prime

  let mut primecnt: u64 = 3;       // 2,3,5 the P5 excluded primes count
  let mut Kn = KB;                 // set sieve resgroups size to segment size
  let mut Ki = 0u64;               // starting resgroup index for each segment

  println!("perform segmented SoZ");

  while Ki < Kmax {                // for KB size resgroup slices up to Kmax
    if Kmax-Ki < (KB as u64) {Kn = (Kmax-Ki) as usize;}// set sieve resgroups size for last segment
    primecnt += segsieve(Kn,Ki,&mut seg,&mut next,&primes,pcnt) as u64; // sieve primes for current segment
    //printprms(Kn,Ki,&seg);       // print primes for the segment (optional)
    Ki += KB as u64;
  }

  let mut lprime: u64;             // get last prime and primecnt <= val                        
  modk = (Kmax-1)*(RMOD as u64);   // mod for last resgroup in last segment
  let mut b = Kn-1;                // num bytes to last resgroup in segment
  r = RESCNT-1;                    // from last restrack in resgroup
  loop {                           // repeat until last prime determined
    if seg[b] & (1 << r) == 0 {    // if restrack in byte[i] is prime
      lprime = modk+(RESIDUES[r+1] as u64); // determine the prime value
      if lprime <= num {break;}    // if <= num exit from loop with lprime
      primecnt -= 1;               // else reduce total primecnt
    }                              // reduce restrack, setup next resgroup
    if r == 0 {r = RESCNT; modk -= RMOD as u64; b -= 1;}; r -= 1; // if necessary
  }
  println!("sqrt(num) = {}, last sieving prime = {}",sqrt_n,primes[pcnt-1]);
  println!("last segment = {} resgroups; segment iterations = {}", Kn, Ki/(KB as u64));
  println!("last prime (of {}) is {}",primecnt,lprime);
}

cuviper · March 18, 2016, 9:06pm

Have you run in debug mode, or at least with debug assertions on? I wonder if you have some unnoticed arithmetic overflows tripping things up...

cuviper · March 18, 2016, 10:58pm

This fails overflow checks right away, even on x86_64. A usize 0-1 on 32-bit would explain the appearance of your 4294967295.

jzakiya · March 20, 2016, 12:46am

Arg..

Apparently it doesn't like negative numbers in the array, because it should be posn[1] = -1

This will fix that.

In fn sozP7 change

   for i in 0..resnum {posn[res[i]] = i - 1;};

to

   for i in 1..resnum {posn[res[i]] = i - 1;} posn[1] = resnum - 1;

then change

   .map(|ri| (k*(prime + ri) + (prm_r*ri/md)*resnum + posn[(prm_r*ri) % md])

to

   .map(|ri| (k*(prime + ri) + ((prm_r*ri - 2)/md)*resnum + posn[(prm_r*ri) % md])

But the reason it was throwing index panic errors was because of this error I found in fn main().

This

let Kmax: u64 = (num-2)/(RESCNT as u64) + 1;

should be this

let Kmax: u64 = (num-2)/(RMOD as u64) + 1;

Making these changes will eliminate the index panic, but.....

It now runs and produces the correct results on 32-bit Linux for inputs upto val = 128_861_000_000, which gives total primes = 5231617477, last prime = 128860999999.

But for input val = 128_862_000_000 the correct prime count is 5,251,656,463, but it gives 5,251,641,960, but gives the correct last prime as 128,861,999,951. So it's sieving out valid primes as nonprimes when the number of residue groups (resgroups) exceeds ~2*32. But this only happens on 32-bit Linux and works correctly on 64-bit Linux.

Here are some sample correct results on 64-bit Linux.

          Val             Pimes Count      Last Prime
      100,000,000,000    4,118,054,813     99,999,999,977
      200,000,000,000    8,007,105,059    199,999,999,949
      500,000,000,000   19,308,136,142    499,999,999,979
    1,000,000,000,000   37,607,912,018    999,999,999,989

cuviper · March 20, 2016, 6:35am

It would help if you shared the full file each time, perhaps in a gist, because that "before" line doesn't exactly match what you posted earlier. But I think I figured that out, and now I get a new overflow on line 80:

next[row*pcnt + j] = k*(prime + ri) + (prod-2)/RMOD;

So I think your top priority should be to make sure you're not overflowing anywhere. Go for a clean run with rustc -g, or if you must have optimization use rustc -O -C debug-assertions=y. (Cargo profiles can set that too.)

birkenfeld · March 20, 2016, 8:18am

Since you're defining posn as [0; 210] the exact type is left to be inferred. Consequently, the inferred type of the array is an unsigned type, probably because you're using its values together with other usizes to make the return value.

To force a signed type, use e.g. [0_i64; 210]. (The underscore is optional, but I like to use it to visually separate the value from the suffix.)

Of course, once you're mixing signedness, you'll have to add casts at the appropriate places.

jzakiya · March 20, 2016, 6:38pm

With your help I FINALLY found the errors of my way!

The reason the code ran correctly on 64-bit cpus is because usize for them is 64-bits but is 32-bits on 32-bit systems. What was happening on 32-bit Linux was some of the intermediary multiplications in nextinit and segsieve were truncating to 32-bit values. So what I had to do was do all the math in those functions as u64 and explicitly cast to as usize just when needed.

Here's the new code for those functions. If there's a cleaner way to do this re-casting please let me know. When I finish cleaning up the code I will create a gist and post the link here.

By the way, the version shown here is not the fastest for doing a serial SSoZ (see my paper on the SSoZ) but it will be the faster way to do a threaded version because the next array, (which is renamed to first in the cleaned up code) is now read-only and can be shared among threads with no issues. When I get a working threaded version(s) I'll post links to that as well.

fn nextinit(next: &mut Vec<u64>, primes: &Vec<usize>, pcnt: usize) {

  let mut pos = [0; RMOD];
  for i in 1..RESCNT {pos[RESIDUES[i]] = i-1}; pos[1] = RESCNT - 1;

  // for each prime store resgroup on each restrack for first prime multiple prime*(modk+ri)
  for j in 0..pcnt {                  // for the pcnt primes r1..sqrt(N)
    let prime = primes[j];            // for each prime
    let k = ((prime-2)/RMOD) as u64;  // find the resgroup it's in
    let r = (prime-2)%RMOD + 2;       // its base residue value
    for ri in &RESIDUES[1..RESCNT+1] {// for each residue value
      let prod = r * ri;              // create residues cross-product r*ri
      let row  = pos[prod % RMOD];    // find residue track its on
      next[row*pcnt + j] = k*((prime + ri) as u64) + ((prod-2)/RMOD) as u64;
    }
  }
}


fn segsieve(kn: usize,ki: u64,seg: &mut Vec<u8>,next: &mut Vec<u64>,primes: &Vec<usize>,pcnt: usize) -> usize {
                                      // for Kn resgroups in segment
  for b in 0..kn {seg[b] = 0;}        // set every byte bit to prime (0)

  for r in 0..RESCNT {                // for each ith (of 8) residues for P5
    let biti = 1 << r;                // set the ith residue track bit mask
    let row = r * pcnt;               // set address to ith row in next[]
    for j in 0..pcnt {                // for each prime <= sqrt(N) for restrack
      if next[row+j] < (ki + (kn as u64)) { // if its 1st multiple resgroup < segment end resgroup
        let prime = primes[j] as u64; // get the prime
        let mut k = next[row+j];      // get its first multiple resgroup value
        if k < ki {                   // if first multiple resgroup < segment start resgroup
          k = (ki - k) % prime;       // jump upto segment start resgroup
          if k != 0 {k = prime - k;}  // compute first resgroup location inside the segment
        } else {k = k - ki;}          // for when first multiple resgroup >= segment start resgroup
        let mut kk = k as usize;      // convert resgroup value inside segment to usize
        while kk < kn {               // for each primenth byte < segment size
          seg[kk] |= biti;            // set ith residue in byte as nonprime
          kk += prime as usize;       // jump to next resgroup byte to sieve
        }
      }
    }
  }
  // count the primes ('0' bits) in the segment for the Kn resgroup bytes
  let mut primecnt: usize = 0;
  for s in 0..kn {primecnt += PBITS[seg[s] as usize] as usize};
  primecnt
}

Topic		Replies	Views
A humble request/challenge	10	1974	January 12, 2023
"as" considered harmful?	80	7283	March 19, 2020
Sharing vector with Rayon par_iter() correctly	29	5353	May 29, 2022
A Friendly Challenge for Rust community	36	4440	January 12, 2023
Why numerical computation casting hell?	24	7830	January 12, 2023

Numerical casting errors in 32-bit Linux compiler

Related topics