Primer — bit-packed prime sieve (fast + tiny memory, embedded-friendly)

Hi everyone,

I’m new here and wanted to share a crate I put together called Primer: a prime generator/sieve focused on being both fast and extremely memory efficient (especially on constrained targets like ESP32-class devices).​

Repo: https://github.com/whisprer/primer​

What it is

Primer is a bit-packed, odd-only Sieve of Eratosthenes implementation that generates primes in increasing order, with an emphasis on low memory residency and good real-world throughput.​

Why it might be useful

A lot of prime approaches end up memory-heavy (or require tradeoffs that don’t suit embedded / small-footprint work), so Primer aims for “small and sharp”: 1 bit per odd candidate, cache-friendly scanning, and fast bit-iteration to skip work.​

The core tricks used are:

  • Bit packing (1 bit per odd number rather than byte/word-per-number style storage).​
  • Odd-only sieving (hardcode 2, then only represent/process odd candidates).​
  • Fast bit scanning using trailing_zeros (which the README notes compiles down to efficient CPU instructions on x86_64).​
  • Brian Kernighan’s bit-iteration trick (iterate only over set bits rather than scanning all 64 positions).​

Performance & footprint (from the README)

In the included benchmarks, Primer is reported as about 64× faster than the primes crate in certain cases, and can be in the same ballpark as primal depending on the workload.​

For n = 500,000 (41,538 primes), the README’s table lists roughly:

  • Primer: ~4 KB memory, ~2 ms.​
  • primes crate: ~500 KB memory, ~15 ms.​
  • primal crate: ~50 KB memory, ~3 ms.​

The README also includes a scaling table up to 100M with estimated memory/time characteristics (e.g., 10M ≈ 80 KB, ~15 ms; 100M ≈ 800 KB, ~200 ms).​

API / examples

The README includes a simple entry point (sieve_primes(limit) -> Vec<u64>) and shows patterns like generating a prime table once at startup and reusing it for things like hashing/partitioning use cases.​

Feedback welcome

I’d love feedback on the approach, API ergonomics, correctness/testing strategy, and benchmark methodology, and I’m very open to making it more idiomatic or adjusting claims if anyone spots an issue.​

Thanks for taking a look!

Your repository hyperlink has a trailing zero-width space, but even after I remove it, I get a 404.

I am legitimately curious: what is this for? Why do you need to enumerate prime numbers on an ESP32?

1 Like

Why are you attributing something to the README of a repo you supposedly wrote?

In addition to the space at the end you are missing an h in your github username.

After fixing the link, the repo seems to include a lot of AI generated elements.

Which usecase do you envision for embedded here?


A side note on the memory efficiency: the bitpacked representation is O(n) where n is how many numbers are being considering.

Storing each prime separately takes more memory for each number, but allows you to skip non-prime numbers. Since every n numbers there are roughly n/ln n prime numbers this might even out at higher ns.

2 Likes

ok , in order then:
oop - i managed to typo both this post and the post in the crate if the week thread - the process of the dread copypasta i got the same typo in both nan missed the 'h' my very own github... :embarrass or what... D:

  • that should have you now fixed and able to access things

@user16251: you'd be honestly amazed that the number of things that employ the primes crate for any number of stuff from basic math operations where primes are needed of a certain order - say a set of a certain size for a bit of cryptography or for random number seed purposes, and the easiest way to get the size of prime required by far just run super fast though the list to hit the size you need, count 'em off and use those particular ones; then there's the random gens for monte carlo sims/procedural generation of video game landscapes n stuff; furthermore, primes are needed to be generated in large quantities for stuff like other forms of crypto operations - even not so immediate crypto ops apparent, a loada crypto stuff uses a quietly operating set of crypto stuff in background - e.g. RSA keygen needs large random primes, some signature schemes need primes or prime-order groups, often “safe primes” or primes with special structure - so to be able to get to the specific needed primes they gotta climb through the whole set to get to known place; next we then got the terrors of fast arithmetic and modular systems - Finite fields 𝐹𝑝, NTT (Number Theoretic Transform) and Chinese Remainder Theorem (CRT) moving swiftly onwards cos ma scared of that stuff; big-integer multiplication & exact polynomial math (seriously common) employ libraries like GMP/FLINT/NTL and modern “exact math” tooling, all of which use the prime moduli heavily; onward to the worlds of randomness testing, hashing tricks, and “unbiased structure”, number theory / research / verification workloads, data structures / load-balancing / scheduling ( 0 all of which rely I primes stuff like Hash table sizing, Bloom filters / Cuckoo hashing, universal hashing and the statistical test suites for RNGs; next let's consider the number theory world which uses 'em for factoring experiments n verifying conjectures / exploring distributions via prime gaps and constellations; ...ok imma spare you here, and also level with ya- tbh I looked waaaay into this because I was interested originally when doing my cpp work ...and also I din't have a clue either at first so I was wandering the same thing ofc. so to out it simply, I can spew out this kinda examples for another few screens - it just goes on 'n on, but you get the idea i reckon!

  • my personal favs are the DSP-adjacent uses cos prime-based rhythm generators for doing polyrhythms, prime-based LFSR-ish patterning and “non-repeating” stepping pattern generation stuff and prime moduli in procedural pattern generators to avoid short cycles are all right up my street of electronic music, so ofc i gotta get a mention of those in :smiley:

@SkiFire13: Well straight off thanks for the warm welcome by insinuating I didn't actually write the repo or therefore I guess in a any way be responsible for my own work - so you're basically introducing yourself as an absolute ass - welp, frankly real charming. I'm not dumb if you think that was not somehow to be and not to be complained about. yes, I do have times when I employ ai;

(OMG _ CALL THE NEWS - HUMAN BEING EMPLOYS USE OF AI!!! ALERT THE WORLD - THIS MUST NEVER HAVE HAPENED BEOFRE OR BE ALLOWED - WHAT IS THE WORLD COMING TO???? ...or possibly since I have DID I fond it quite useful to lean in the memory of the ai and allow it to nudge me when I suffer the inevitable amnesias and memory loss in a regular basis - pretty much to the point i'll be damned d if at times I could function without the assistance of ai.)

moving on from that=, to answer your question though I'm not certain it deserves it, I attributed the README like that purely because it cam from a whole other repo I wrote a significant period of time before and since I suffer form DSID I experience periods of time long past quite often as if performed by completely different people.

Frankly I'm not fully clear what these use cases of embedded computing your struggling with are - of you're comfortable enough to be familiar with the term embedded computing without having to ask further explanation then one would assume you understand by it's nature it iunv0olves unusually small computer systems that have way less memory power, cpu and other resources - do I really need to further spell this out or do you want the full, essay on 'the uses of embedded computing systems in the 21st century?

I'm afraid that next one has me outright lost as to whether to was a question that i'm baffled by the exact nature of, or just a statement in weird grammar so you'll have to excuse me for not responding frankly I don't think I can.

and finally yes indeed -that extra info i'm sure is muchly appreciated by others reading above - goof for it!

Did you have a specific use case in mind? Most of those applications don't really need to enumerate primes or could make do with a small hardcoded list. Anyway, that being said ... Your crate uses floating-point arithmetic in a few places and if you're running on embedded you probably want to avoid that (and division). Is it really worth pulling in all the code for f64::ln just to estimate \pi(n)? You can take square roots with integer math and even do it with no division, just shifts and adds (see Hacker's Delight). Unless your application needs it you probably want to work with u32 instead of u64.

2 Likes

For example: in such a constrained system like the usual embedded system is, why would I ever want to compute some prime numbers through sieving while I could precompute them at compile time and embed them in the executable?

3 Likes