A humble request/challenge


#1

Hi

I primarly use Ruby for math/science problems/projects because it’s so easy to program in, and it allows me to think about how to solve problems without worrying about how to code it. I’ve also played around with Crystal (aka Ruby on steroids) but it’s still young, and doesn’t let me (currently) do what I want, or without a lot of hassles, and I’ve just recently started looking at Nim (less than a month at time of writing).

Back in 2014/15 I started looking into Rust, which was before it hit 1.0. Now that it’s reached a good level of stability I was thinking of maybe trying to really learn it again.

I am really interested in the capabilities of these language to do real parallel processing of math algorithms.

I developed a prime sieve called the Sieve of Zakiya (SoZ), and it’s companion the Segmented Sieve of Zakiya (SSoZ). I wrote a paper, The Segmented Sieve of Zakiya (SSoZ) which describes its mathematical foundations and algorithm, and provide a working C++ implementation at the end of the paper (compiled, run, and verified), though I don’t consider myself a C++ programmer, just funcitonal enough in it. It’s a relatively short program.

Here’s a link to read and download the paper:

My humble request/challenge is for a/some skilled Rustaceans to translate the code into idiomatic Rust to demonstrate the best way to code the algorithm in it. Extra points if someone can do a true parallel version of the algorithm, which I attempted to do using OpemMP, but what I did didnt’ seem to make the code faster than the serial version (see paper).

I assume coding this the Rust way would look different than the C++ code, and all I can possible do now is try to do a direct translation to Rust using the same (probably suboptimal) structure.

Ultimately, I’d like to publish the results of benchmarks in different languages doing the SSoZ in an updated paper.

If anyone would be willing to take up the challenge I’d be pleased to answer any questions the best I can.

Thanks in advance.

Jabari


#2

Note to self: give this a shot.

@dobkeratops has also got a fair amount of experience with high performance coding so he may be someone to talk to.


#3

@jzakiya just out of curiosity, what programming language did you use for your prototype code? It doesn’t look like C++ or Matlab or any other language I’ve used before.


I also did a line-for-line translation of your Listing 1 and it looks like it has a bug or two:

There’s a line where you accidentally underflow (if using unsigned numbers) or put a negative number in your pos array. When i is 0, you are doing 0 - 1, which underflows an unsigned integer.

for (i = 0; i < rescnt; i++) pos[residues[i]] = i - 1;

There are also several places where you do things inefficiently or waste memory. For example allocating an entire temporary array at runtime to put the primes you find for that iteration (primes[] = new [max]; // allocate mem at runtime), only to throw the array away immediately without actually doing anything with the primes (kinda defeating the purpose of the entire exercise…).


Also, don’t forget that vowels and whitespace don’t cost anything :wink: Using smaller variable names like rescnt or prms isn’t going to make your code run any faster.


#4

Hi Michael-F-Bryan

Thanks for looking at the code.

The C++ code as written works, i.e. it compiles (using g++ on Linux) and produces correct results.

The first snippet you address in Listing 1 does (intentionally) put the value -1 in the pos array, which if you look is signed int in the final source code. The pos array could also be a hash but it was easier (faster) for me to do it as an array in C++. If you look in the final code at the end of the paper (the actual code that is compiled) its renamed as posn.

int posn[210];
for (int i; i < rescnt; i++) posn[residues[i]] = i - 1;

So here the posn array is a signed int array, which will take negative values (I only use -1, which I can eliminate but doing so makes the code more clunky).

The primes array is a global parameter which is filled by the sozP7 function, and holds the primes <= sqrt(N), which is then used in the nextinit function to create the next array.

I would encourage you to compile the source code shown at the end of the paper for ssozp5cpp.cpp, which you can copy from the paper, or download the source code from below, which has other versions too:

The C++ version is nowhere near optimal, since its not my primary language and tt’s over 3 years old now. I’m sure someone who is fluent in C++ can create a much faster/flexible implementation. That’s why I’m requesting for a truly idomatic Rust version from someone who really is fluent in it, to showcase how it can do this algorihm faster and more efficient.


#5

Ah, that makes sense. I thought it must have been a mistake because you almost never see an array which starts like [-1, 0, 1, 2, ...]. Something like that is typically the result an off-by-one error, and I assumed you would be using unsigned integers because then that cuts the total amount of numbers you can check in half (assuming you aren’t using a big integer library for performance reasons).


Sorry if I came across as a little passive-aggressive in my previous post! I was finding it really hard to understand your code snippets because everything was very terse and compact and got a little frustrated. That’s not your fault though! Your priority would have been to just get the algorithm down on paper and to make the resulting article as small as possible, while I came in with all my previous assumptions about coding and expected to understand everything at a glance.

After going through the Python example and rereading your paper I managed to get my head around it your algorithm. It’s actually pretty smart and I reckon the concept of PrimeGenerators will tie in really well with Rust’s iterators and parallelism (particularly the rayon library).

I think I’m going to rewrite it in Rust from scratch using the paper as a guideline instead of slowly translating the C++ code to idiomatic Rust. Doing it in a more functional (as in Functional Programming) way should make it for others easier to understand and hopefully still be pretty performant.


#6

I’m glad I answered your questions sufficiently, and took no offense in your manner of asking.

Thank you for showing interest and I really appreciate you attempting to rewrite it in a different paradigm. This is exactly what I was hoping someone would do, to take advantage of the best Rust techniques.

Please continue to ask me more questions concerning anything about the algorithm, its math, or the paper.


#7

I’ve posted my initial implementation up on GitHub (repo). I’m still working on the last step, getting rid of non-primes in the prime candidates list, but once I get that working I can code up a couple other prime number generators and do benchmarks to see how they perform.

EDIT: Check out the sieve_of_zakiya() function. Even though it’s written in Rust, it should still be fairly understandable if you just go by variable/method names.

I don’t know if you can make adjustments to your paper now that it’s already been published, but you can probably make it a lot more approachable and understandable by simply using better datastructures and breaking down the algorithm into levels of abstraction.


For example, the algorithm for the Sieve of Zakiya is (very roughly):

  1. Generate a bunch of prime candidates using a Prime Generator.
  2. Filter out all non-primes using the residues
  3. Merge the list of primes we’ve found with the small list of excluded primes.

Both steps 1 and 2 are almost trivially parallelizable, while step 3 is expected to be negligible seeing as you are trying to merge a really small list of sorted numbers with a really big one.

Now that the reader has a general idea of how the sieve works, you can break down how steps 1 and 2 are done. Introducing the concept of a prime generator, how they work, which generator is better than others and why, etc.

Another thing is you’ll often mix the algorithm’s logic with bookkeeping things (array lengths, indices, miscellaneous intermediates, etc), which makes it a little hard to figure out what is being done. A lot of that can be fixed by selecting the appropriate data structure for the task at hand though.


#8

Hey, thanks for the feedback.

Looking at the paper now in August 2017, three years after its release in 2014, I also see lots of ways I can explain it better.

I intend to update the paper, and one of the things I’ll do is use more illustrations. This will require more work (to create them) but it will make the written explanations of the math and algorithm clearer.

Another thing I’m considering is doing a video, but that’s a bigger project that I don’t currently have the knowledge or resources to do well.

Now that I have a more powerful laptop to work with than when I developed and wrote the paper I can attempt to explore some parallel implementations. This is one reason I’m looking at other language implementations, especially those that claim native parallel programming capabilities.

Please continue to provide more critiques and suggestions. I really don’t have people to bounce my ideas off of to get feedback from so I welcome your thoghtful observations for improvement.


#9

I forgot to mention this in my previous post.

One improvement I intend to add is the ability to sieve primes over ranges of numbers (start_num - end_num). I explained how to implement the next function in the paper to accommodate this.

I originally did all my work in Ruby, and created a rubygem named primes-utils. I also wrote a companion book for the gem, the Primes Utils Handbook which explains in detail the math, algorithms, and provides all the code for the gem. You can read and download it from the following link.

Because Ruby doesn’t allow you (yet) to define byte elements, I couldn’t do a SSoZ implementation in it, but the Handbook shows in detail the math and code for sieving over ranges. It may be good to install ruby and the gem and play with it to see how it works. Maybe some of those methods can be put into a Rust library too.


#10

Besides looking at Rust I also started looking at Nim, which also says its a system programming language to replace C/C++. It has a lot less steeper learning curve than Rust, and I was able to translate some old (2014) C++ code of different SSoZ programs in a couple of weeks. Here is list of gist for some of this code.

ssozp5.nim

Find the primes <= N using sequential Segmented Sieve of Zakiya (SSoZ) with P5 prime generator, written in Nim

nthprime_ssozp5.nim

Find Nth Primes using sequential Segmented Sieve of Zakiya (SSoZ) with P5 prime generator, written in Nim

twinprimes_ssozp5.nim

Find Twin Primes <= N using sequential Segmented Sieve of Zakiya (SSoZ) with P5 prime generator, written in Nim

The Nim code is much less “noisy” than the C++ versions, and actually runs faster, and is easier to understand, since I’ve verbosely documented almost every line in it. They should be much easier to create Rust translations from than the C++ versions.

I’ve rearchitectured the memory model to make it more parallel friendly to implement, which is what I really want to see how well Rust can implement the algorithm. I haven’t figured out how to do that in Nim yet (it’s only at 0.17.0, and may not currenty be capable of doing it).

It really would be interesting to see comparisons between Nim vs Rust implemenations (source code, speed, executable size, memory, etc). Still hoping someone can do a Rust translation.