Best bitarray implementation

I was looking at past threads on this topic, but they were a couple of years old.

I'm looking for the best crate, et al, for using bitarrays.

The max array size for me is ~1.6mb.
I need to be able to clear it fast (set to all 0s or 1s)
and count all the 0s|1s at a given time.

The faster, the simpler, the better.

I think the best crate for this would be bitvec, which has a BitVec type for resizable arrays and a BitArray type for fixed-size arrays.

use bitvec::prelude::*;

let mut v = bitvec![0, 1, 0, 1, 0, 1, 0, 1];
v.fill(false);
assert_eq!(v, bits![0; 8]);
v.fill(true);
assert_eq!(v, bits![1; 8]);
use bitvec::prelude::*;

let v = bitvec![0, 0, 0, 1, 0, 1, 0, 1];
assert_eq!(v.count_zeros(), 5);
assert_eq!(v.count_ones(), 3);
2 Likes

Hey thanks for the reply.

I'll give it a try over the weekend and report back how it performs for my use case.
If there are other suggestions I'll compare them too.

The huge question I have is how the pattern of your bits actually works. Is it big, but sparse? Are there long runs of the same value? Do you only need to clear to 0, to 1, or to both? How often do you check the set bit count compared to how often you update them?

There's a whole variety of possible implementations here with vastly different tradeoffs depending what precisely you need.

2 Likes

Here's the Crystal code I'm trying to do in Rust. It's for a prime sieve.

seg is a bitarray of ks bits, that varies from 250kb to 1.6mb, depending on
input data, but once set never changes in size.

The routine first creates and initialize seg to all 0s|false (primes).
In the 2 inner loops, the algorithm sets the nonprime bit addresses to 1s|true.
It then counts the number of primes in seg.
It then determines the value of the largest prime in seg.
The seg.unsafe_put|fetch turns off bounds checking for speed (I know what I'm doing).
At the end it sets seg to all 0s|false (primes) again and repeats the process until finished.

So there's nothing hard or exotic going on here, just very basic stuff.

I already have fast Crystal|Rust versions where seg is vector of 64-bit mem elements that I process manually. As of Crystal 1.4.0 (April 2022) they greatly improved their bitarray implementation, which is now faster than the 64-bit mem version, which makes the code much simpler, shorter, and easier to understand. I'm trying to see if I can do the same with Rust and not lose any performance.

  sum, ki, kn  = 0_u64, kmin-1, ks                     
  hi_tp, k_max = 0_u64, kmax                           
  seg = BitArray.new(ks)                               
  ki += 1    if ((ki * modpg) + r_hi - 2)  < start_num 
  k_max -= 1 if ((k_max - 1) * modpg + r_hi) > end_num 
  nextp = nextp_init(r_hi, ki, modpg, primes,resinvrs) 
  while ki < k_max                       
    kn = k_max - ki if ks > (k_max - ki) 
    primes.each_with_index do |prime, j| 
                                         
      k = nextp.to_unsafe[j * 2]         
      while k < kn                       
        seg.unsafe_put(k, true)          
        k += prime  end                  
      nextp.to_unsafe[j * 2] = k - kn    
                                         
      k = nextp.to_unsafe[j * 2 | 1]     
      while k < kn                       
        seg.unsafe_put(k, true)          
        k += prime  end                  
      nextp.to_unsafe[j * 2 | 1]= k - kn 
    end                                  
    cnt = seg[...kn].count(false)     
    if cnt > 0                           
      sum += cnt                         
      upk = kn - 1                       
      while seg.unsafe_fetch(upk); upk -= 1 end
      hi_tp = ki + upk                   
    end
    ki += ks                             
    seg.fill(false) if ki < k_max        
  end 

It seems bitvec won't work because it won't accept a variable of the array size.

Can someone show code to make it mimic the Crystal version.

Looking at the documentation for Crystal's BitArray, the most similar type in the bitvec crate would be BitBox. The operations correspond like this (assuming use bitvec::prelude::*; on the Rust side):

BitArray (Crystal) BitBox (Rust)
seg = BitArray.new(ks) let seg = bitbox![0; ks];
seg.unsafe_put(k, true) seg.set_unchecked(k, true);
cnt = seg[...kn].count(false) let cnt = seg[..=kn].count_zeros();
while seg.unsafe_fetch(upk); ... end while *seg.get_unchecked(upk) { ... }
seg.fill(false) seg.fill(false);

I had done literally the same code, except I did: let seg = bitvec![0; ks].
Either way, the problem is it will not create an array with a variable for its size.
And none of the crate's documentation show an example for this case.
I hope this can be done otherwise the crate is pretty useless.

error[E0435]: attempt to use a non-constant value in a constant
   --> src/main.rs:230:30
    |
230 |     let mut seg = bitbox![0; ks]; 
    |                   -----------^^-
    |                   |          |
    |                   |          non-constant value
    |                   help: consider using `let` instead of `const`: `let ELTS`

For more information about this error, try `rustc --explain E0435`.
error: could not compile `twinprimes_ssoz` due to previous error

My apologies; a dynamic length works for the regular vec![] macro, so I assumed it would work just as well for the bitvec![] and bitbox![] macros. It looks like BitVec::repeat() should do the trick, e.g., let seg = BitVec::repeat(false, ks).into_boxed_bitslice();.

I don't know about what the above algorithm is doing.. But I threw together something with bit_field and it seems to do the job efficiently. Tweaked to to contain either bit-vector of ALL numbers, or just the odds.. Was a fun day long project doing some prime math based on shannon entropy from this..

Incidentally it SUCKs that I can't propagate computed constants in rust right now. I had to pass N_HALF as a top-level computed value because I get Can't use type parameters from outer function compiler error when I do a local const N_HALF: u32 = N / 2 where N was the passed in generic const. You can see that I had to calculate N_HALF in the unit test which calls the function.

That being said, the referenced version (with constant compiled N, N_HALF) v.s. a version that just passes n as a parameter is only like 0.5% difference in performance (within the margin of sampling error), so either the compiler already detected the --release build of test as passing in a constant (and thus inlined the hell out of everything) or dynamic dispatch didn't really affect this code. release code with micro-benchmarks are a really really really tricky thing to get right - so this isn't representative (guess I could take in a command line parameter for N).

I "assumed" the same thing too, and it should be that way, but unfortunately isn't.
And Bitvec::repeate(false, ks).into_boxed_bitslice(); won't compile either.
And I tried multiple ways to get something that would at least compile, but no go.
So as it stands now I still can't create Rust code to mimic the Crystal code functionality.

Let me just opine a moment of what this experience with Rust reinforces in my mind.

My primary languages currently are Crystal and Ruby, because I can easily use them to develop with. I've put in allot of time to learn Rust, mostly by translating Ruby/Crystal code to it, but I would never use it to initially develop anything with because its paradigms are too different, and it isn't designed to be friendly to users, as this example highlights.

I would urge for imagining what Rust 2.x should be, end user friendliness should be given the highest priority. I would really, really, really like to use Rust more but right now it's too much of a PITA to get simple stuff done, compared to other tools I can user easier and quicker.

But I still want to know how to get a functional bitarray to work in my Rust code.

That's odd; this code works fine for me:

use bitvec::prelude::*;

fn main() {
    let ks = 128;
    let mut seg: BitBox = BitVec::repeat(false, ks).into_boxed_bitslice();
    seg.fill_with(|i| i % 3 == 0);
    println!("{}", seg.count_zeros());
}

Your issue might be due to a type inference failure. BitVec and BitBox are generic over the underlying storage type and endianness, so sometimes an explicit let v: BitVec = ... or let b: BitBox = ... annotation is necessary to use the default. I can't say for certain, though, since you haven't posted the compiler error you're getting.

Incidentally, it turns out that the bitvec![] and bitbox![] macros actually do support variable sizes! It's just that we have to specify the normally-optional storage type and endianness:

use bitvec::prelude::*;

fn main() {
    let ks = 128;
    let mut seg = bitbox![usize, Lsb0; 0; ks];
    seg.fill_with(|i| i % 3 == 0);
    println!("{}", seg.count_zeros());
}

That should probably be documented better. Rust library authors do care about making their libraries user-friendly, but nobody can be perfect.

The bitvec crate has the best bit-sequence implementation I know of. I can't really help you without knowing what specific problems you're having with it.

Ah, that works now. I didn't have the: let mut seg: BitVec = BitBox... part.

Here's the Rust version of the Crystal code.
It produces the correct results but is appreciably slower than Crystal.
Maybe the crate owner should look at it and see what they're doing to make it faster.

  unsafe {                                                    
    let (mut sum, mut ki, mut kn) = (0usize, kmin-1, ks);     
    let (mut hi_tp, mut k_max) = (0usize, kmax);              
    let mut seg: BitBox = BitVec::repeat(false, ks).into_boxed_bitslice();
    if ((ki * modpg) + r_hi - 2)  < start_num { ki += 1; }    
    if ((k_max - 1) * modpg + r_hi) > end_num { k_max -= 1; } 
    let mut nextp = nextp_init(r_hi, ki, modpg, primes, resinvrs);
    while ki < k_max {                                 
      if ks > (k_max - ki) { kn = k_max - ki }         
      for (j, prime) in primes.iter().enumerate() {    
                                                       
        let mut k = *nextp.get_unchecked(j * 2);       
        while k < kn  {                                
          seg.set_unchecked(k, true);
          k += prime; }                                
        *nextp.get_unchecked_mut(j * 2) = k - kn;      
                                                       
        k = *nextp.get_unchecked(j * 2 | 1);           
        while k < kn  {                                
          seg.set_unchecked(k, true);
          k += prime; }                                
        *nextp.get_unchecked_mut(j * 2 | 1) = k - kn;  
      }                                                

      let cnt = seg[..kn].count_zeros();  
      if cnt > 0 {                        
        sum += cnt;                       
        let mut upk = kn - 1;                          
        while *seg.get_unchecked(upk) { upk -= 1 }
        hi_tp = ki + upk;                              
      } 
      ki += ks;                                        
      if ki < k_max { seg.fill(false); }               
    }                                                  

Thanks for the help! I really appreciate it.
But I wish the documentation would just show this and make users lives easier.

Could you post the rest of your Rust code? I assume this is your Crystal code. I'll see if there's anything in particular slowing it down.

Yes, that's the Crystal version using a bitarray.

Here's the Rust version using 64-bit mem elements in twins_sieve for the seg array.
Compile and run it, then create another file and put in the BitVec code, and you'll see it's much slower.

In general, the Rust version is faster than Crystal w/o using a bitarray, but the bitarray version of Rust is slower than the original Rust code and the Crystal versions w/o using a bitarray.

The bitarray code for Crystal makes it faster (and simpler), especially with large number inputs.
That's not the case (currently) with Rust.

cc @myrrlyn (bitvec author)

Hi! I've been made aware of slowdowns in bitvec since 1.0. Unfortunately I haven't had time to address them since release, as I've been working on buying a house and moving into it (packing my truck … right now, actually).

I suspect that some of the performance cost is from my removal of #[inline] markers on my public functions. I'm going to try that as my first step once I can resume working on this later this summer.

As noted above, bitvec structures rely on type parameters to control their memory layout and behavior. Additionally, bitvec has unremovable slowdowns as a result of my refusal to compromise the API's promise that bit-slices are normal slices and can be selected anywhere in memory, not only on element boundaries. BitArray could be faster if I specialized it to bypass the bit-slice infrastructure (since it always operates on whole elements) but I haven't yet. Sorry.

Anyway, yeah. Dynamic lengths can only use dynamic allocators, not the BitArray type, you have to explicitly specify the full storage type you're using (because collections and protocols use different layouts), you pay a speed penalty in that I have to check for partial edges every time you call my APIs, and I'm thread-safe.

Try using type MyBitBox = BitBox<Cell<usize>, Lsb0>;? That should speed it up a bit if you're not crossing thread boundaries, but the .fill() method gallops across whole elements so I don't think there's much room for perf gain on it :confused:

The API docs try to talk about the overall crate behavior, but unfortunately they're zoomed-in on individual items and there's not as much room to talk about a holistic view except in the crate README and module docs. There's a full user guide at Dedication - A User’s Guide to the `bitvec` Project. It's a lot of text and I completely get not wanting to read a tome (comparatively speaking) for the problem of "I just want to index bits rather than bytes". The curse of generality is API complexity and as much as I try to hide that, ultimately I have to either make it your problem also or else not be usable at all.

If you're willing, please file an issue with minimal reproducible programs so I can bench your code and try to accelerate it.

And please ping me through one of the addresses listed in my CONTRIBUTING if I don't get back to you by August!

2 Likes

I've done some profiling of the program, and the culprit seems to be BitSlice::fill():

    pub fn fill(&mut self, value: bool) {
		for ptr in self.as_mut_bitptr_range() {
			unsafe {
				ptr.write(value);
			}
		}
	}

The Vec-based code compiles to a glibc memset, but this loops over each bit in the assembly. I'll see if I can't come up with a PR with a faster Domain-based implementation.

@jzakiya For now, you can fix this by replacing the final seg.fill(false); with seg.as_raw_mut_slice().fill(0);. With this change, both the Vec and bitvec versions run in exactly the same time (18.0–18.5 s for an upper bound of 1T on my machine).

1 Like

Let me first say I appreciate the time and work you've put in to create this crate.
As a single developer I know you did it out of your interest (and love) for providing it for others.

When you get the time, please look at Crystal's implementation of bitarray.
It may provide some insight and ideas to improve your implementation.
It received a major overhaul that greatly improved its performance in Crystal 1.4.0 in April 2022.
See it's relevant docs below.

https://crystal-lang.org/api/1.4.1/BitArray.html