The huge question I have is how the pattern of your bits actually works. Is it big, but sparse? Are there long runs of the same value? Do you only need to clear to 0, to 1, or to both? How often do you check the set bit count compared to how often you update them?
There's a whole variety of possible implementations here with vastly different tradeoffs depending what precisely you need.
Here's the Crystal code I'm trying to do in Rust. It's for a prime sieve.
seg is a bitarray of ks bits, that varies from 250kb to 1.6mb, depending on
input data, but once set never changes in size.
The routine first creates and initialize seg to all 0s|false (primes).
In the 2 inner loops, the algorithm sets the nonprime bit addresses to 1s|true.
It then counts the number of primes in seg.
It then determines the value of the largest prime in seg.
The seg.unsafe_put|fetch turns off bounds checking for speed (I know what I'm doing).
At the end it sets seg to all 0s|false (primes) again and repeats the process until finished.
So there's nothing hard or exotic going on here, just very basic stuff.
I already have fast Crystal|Rust versions where seg is vector of 64-bit mem elements that I process manually. As of Crystal 1.4.0 (April 2022) they greatly improved their bitarray implementation, which is now faster than the 64-bit mem version, which makes the code much simpler, shorter, and easier to understand. I'm trying to see if I can do the same with Rust and not lose any performance.
sum, ki, kn = 0_u64, kmin-1, ks
hi_tp, k_max = 0_u64, kmax
seg = BitArray.new(ks)
ki += 1 if ((ki * modpg) + r_hi - 2) < start_num
k_max -= 1 if ((k_max - 1) * modpg + r_hi) > end_num
nextp = nextp_init(r_hi, ki, modpg, primes,resinvrs)
while ki < k_max
kn = k_max - ki if ks > (k_max - ki)
primes.each_with_index do |prime, j|
k = nextp.to_unsafe[j * 2]
while k < kn
seg.unsafe_put(k, true)
k += prime end
nextp.to_unsafe[j * 2] = k - kn
k = nextp.to_unsafe[j * 2 | 1]
while k < kn
seg.unsafe_put(k, true)
k += prime end
nextp.to_unsafe[j * 2 | 1]= k - kn
end
cnt = seg[...kn].count(false)
if cnt > 0
sum += cnt
upk = kn - 1
while seg.unsafe_fetch(upk); upk -= 1 end
hi_tp = ki + upk
end
ki += ks
seg.fill(false) if ki < k_max
end
Looking at the documentation for Crystal's BitArray, the most similar type in the bitvec crate would be BitBox. The operations correspond like this (assuming use bitvec::prelude::*; on the Rust side):
I had done literally the same code, except I did: let seg = bitvec![0; ks].
Either way, the problem is it will not create an array with a variable for its size.
And none of the crate's documentation show an example for this case.
I hope this can be done otherwise the crate is pretty useless.
error[E0435]: attempt to use a non-constant value in a constant
--> src/main.rs:230:30
|
230 | let mut seg = bitbox![0; ks];
| -----------^^-
| | |
| | non-constant value
| help: consider using `let` instead of `const`: `let ELTS`
For more information about this error, try `rustc --explain E0435`.
error: could not compile `twinprimes_ssoz` due to previous error
My apologies; a dynamic length works for the regular vec![] macro, so I assumed it would work just as well for the bitvec![] and bitbox![] macros. It looks like BitVec::repeat() should do the trick, e.g., let seg = BitVec::repeat(false, ks).into_boxed_bitslice();.
I don't know about what the above algorithm is doing.. But I threw together something with bit_field and it seems to do the job efficiently. Tweaked to to contain either bit-vector of ALL numbers, or just the odds.. Was a fun day long project doing some prime math based on shannon entropy from this..
Incidentally it SUCKs that I can't propagate computed constants in rust right now. I had to pass N_HALF as a top-level computed value because I get Can't use type parameters from outer function compiler error when I do a local const N_HALF: u32 = N / 2 where N was the passed in generic const. You can see that I had to calculate N_HALF in the unit test which calls the function.
That being said, the referenced version (with constant compiled N, N_HALF) v.s. a version that just passes n as a parameter is only like 0.5% difference in performance (within the margin of sampling error), so either the compiler already detected the --release build of test as passing in a constant (and thus inlined the hell out of everything) or dynamic dispatch didn't really affect this code. release code with micro-benchmarks are a really really really tricky thing to get right - so this isn't representative (guess I could take in a command line parameter for N).
I "assumed" the same thing too, and it should be that way, but unfortunately isn't.
And Bitvec::repeate(false, ks).into_boxed_bitslice(); won't compile either.
And I tried multiple ways to get something that would at least compile, but no go.
So as it stands now I still can't create Rust code to mimic the Crystal code functionality.
Let me just opine a moment of what this experience with Rust reinforces in my mind.
My primary languages currently are Crystal and Ruby, because I can easily use them to develop with. I've put in allot of time to learn Rust, mostly by translating Ruby/Crystal code to it, but I would never use it to initially develop anything with because its paradigms are too different, and it isn't designed to be friendly to users, as this example highlights.
I would urge for imagining what Rust 2.x should be, end user friendliness should be given the highest priority. I would really, really, really like to use Rust more but right now it's too much of a PITA to get simple stuff done, compared to other tools I can user easier and quicker.
But I still want to know how to get a functional bitarray to work in my Rust code.
use bitvec::prelude::*;
fn main() {
let ks = 128;
let mut seg: BitBox = BitVec::repeat(false, ks).into_boxed_bitslice();
seg.fill_with(|i| i % 3 == 0);
println!("{}", seg.count_zeros());
}
Your issue might be due to a type inference failure. BitVec and BitBox are generic over the underlying storage type and endianness, so sometimes an explicit let v: BitVec = ... or let b: BitBox = ... annotation is necessary to use the default. I can't say for certain, though, since you haven't posted the compiler error you're getting.
Incidentally, it turns out that the bitvec![] and bitbox![] macros actually do support variable sizes! It's just that we have to specify the normally-optional storage type and endianness:
use bitvec::prelude::*;
fn main() {
let ks = 128;
let mut seg = bitbox![usize, Lsb0; 0; ks];
seg.fill_with(|i| i % 3 == 0);
println!("{}", seg.count_zeros());
}
That should probably be documented better. Rust library authors do care about making their libraries user-friendly, but nobody can be perfect.
The bitvec crate has the best bit-sequence implementation I know of. I can't really help you without knowing what specific problems you're having with it.
Ah, that works now. I didn't have the: let mut seg: BitVec = BitBox... part.
Here's the Rust version of the Crystal code.
It produces the correct results but is appreciably slower than Crystal.
Maybe the crate owner should look at it and see what they're doing to make it faster.
unsafe {
let (mut sum, mut ki, mut kn) = (0usize, kmin-1, ks);
let (mut hi_tp, mut k_max) = (0usize, kmax);
let mut seg: BitBox = BitVec::repeat(false, ks).into_boxed_bitslice();
if ((ki * modpg) + r_hi - 2) < start_num { ki += 1; }
if ((k_max - 1) * modpg + r_hi) > end_num { k_max -= 1; }
let mut nextp = nextp_init(r_hi, ki, modpg, primes, resinvrs);
while ki < k_max {
if ks > (k_max - ki) { kn = k_max - ki }
for (j, prime) in primes.iter().enumerate() {
let mut k = *nextp.get_unchecked(j * 2);
while k < kn {
seg.set_unchecked(k, true);
k += prime; }
*nextp.get_unchecked_mut(j * 2) = k - kn;
k = *nextp.get_unchecked(j * 2 | 1);
while k < kn {
seg.set_unchecked(k, true);
k += prime; }
*nextp.get_unchecked_mut(j * 2 | 1) = k - kn;
}
let cnt = seg[..kn].count_zeros();
if cnt > 0 {
sum += cnt;
let mut upk = kn - 1;
while *seg.get_unchecked(upk) { upk -= 1 }
hi_tp = ki + upk;
}
ki += ks;
if ki < k_max { seg.fill(false); }
}
Thanks for the help! I really appreciate it.
But I wish the documentation would just show this and make users lives easier.
Here's the Rust version using 64-bit mem elements in twins_sieve for the seg array.
Compile and run it, then create another file and put in the BitVec code, and you'll see it's much slower.
In general, the Rust version is faster than Crystal w/o using a bitarray, but the bitarray version of Rust is slower than the original Rust code and the Crystal versions w/o using a bitarray.
The bitarray code for Crystal makes it faster (and simpler), especially with large number inputs.
That's not the case (currently) with Rust.
Hi! I've been made aware of slowdowns in bitvec since 1.0. Unfortunately I haven't had time to address them since release, as I've been working on buying a house and moving into it (packing my truck … right now, actually).
I suspect that some of the performance cost is from my removal of #[inline] markers on my public functions. I'm going to try that as my first step once I can resume working on this later this summer.
As noted above, bitvec structures rely on type parameters to control their memory layout and behavior. Additionally, bitvec has unremovable slowdowns as a result of my refusal to compromise the API's promise that bit-slices are normal slices and can be selected anywhere in memory, not only on element boundaries. BitArray could be faster if I specialized it to bypass the bit-slice infrastructure (since it always operates on whole elements) but I haven't yet. Sorry.
Anyway, yeah. Dynamic lengths can only use dynamic allocators, not the BitArray type, you have to explicitly specify the full storage type you're using (because collections and protocols use different layouts), you pay a speed penalty in that I have to check for partial edges every time you call my APIs, and I'm thread-safe.
Try using type MyBitBox = BitBox<Cell<usize>, Lsb0>;? That should speed it up a bit if you're not crossing thread boundaries, but the .fill() method gallops across whole elements so I don't think there's much room for perf gain on it
The API docs try to talk about the overall crate behavior, but unfortunately they're zoomed-in on individual items and there's not as much room to talk about a holistic view except in the crate README and module docs. There's a full user guide at Dedication - A User’s Guide to the `bitvec` Project. It's a lot of text and I completely get not wanting to read a tome (comparatively speaking) for the problem of "I just want to index bits rather than bytes". The curse of generality is API complexity and as much as I try to hide that, ultimately I have to either make it your problem also or else not be usable at all.
If you're willing, please file an issue with minimal reproducible programs so I can bench your code and try to accelerate it.
And please ping me through one of the addresses listed in my CONTRIBUTING if I don't get back to you by August!
I've done some profiling of the program, and the culprit seems to be BitSlice::fill():
pub fn fill(&mut self, value: bool) {
for ptr in self.as_mut_bitptr_range() {
unsafe {
ptr.write(value);
}
}
}
The Vec-based code compiles to a glibc memset, but this loops over each bit in the assembly. I'll see if I can't come up with a PR with a faster Domain-based implementation.
@jzakiya For now, you can fix this by replacing the final seg.fill(false); with seg.as_raw_mut_slice().fill(0);. With this change, both the Vec and bitvec versions run in exactly the same time (18.0–18.5 s for an upper bound of 1T on my machine).
Let me first say I appreciate the time and work you've put in to create this crate.
As a single developer I know you did it out of your interest (and love) for providing it for others.
When you get the time, please look at Crystal's implementation of bitarray.
It may provide some insight and ideas to improve your implementation.
It received a major overhaul that greatly improved its performance in Crystal 1.4.0 in April 2022.
See it's relevant docs below.