BitSet vs BitVector

Some weeks ago, I ported my tiny toy chess engine from Nim to Rust, see GitHub - StefanSalewski/rust-chess: Port of salewski-chess from Nim to Rust

For the chess board, I needed a plain 64-bit set, with support to set and test individual bits. In C we would use just an 64-bit int for that, with the ordinary ||, &&, and << operations to access the bits. In Nim I used the Nim BitSet type, which is a built-in fixed size type. For the Rust version, I am using currently the BitSet crate, which works fine. But from the beginning I had the feeling that there is some overhead with the BitSet crate involved, as BitSet can resize, while I need only a fixed size, basically an u64. Recently I found the BitVector crate, which seems to be close to Nim's BitSet, with fixed size. I just tried to replace BitSet with BitVector, but then I noticed that BitVector has no Default::default() implementation, which BitSet provides. But I am using Default for initialization of some of my structs, as in

#[derive(Default, Clone)]
struct HashResult {
    score: HashLine1, // exact values
    floor: HashLine2, // lower bounds
    kks: KKS,
    pri: i64,
    king_pos: usize,
    pop_cnt: i64,
    control: ChessSquares,
    state: State,
    in_check: bool,
}

with ChessSquares being a BitSet or BitVector. Well, I might be able to disable Default, but then all my initializations may become very verbose. So my questions:

Why has BitVector no Default implementation. And can I implement it myself for a 64-bit Bitvector, or can only the crate itself implement that?

And has BitSet some overhead, or is all the overhead removed by the compiler?

I guess that BitVector would have no overhead, compared to bit tests in a plain u64?

Finally, the BitSet API docs at bit_set - Rust contain this sentence:

It should also be noted that the amount of storage necessary for holding a set of objects is proportional to the maximum of the objects when viewed as a usize.

I am still unable to understand it. What are the "objects"? Can the BitSet contain other data types than unsigned integers? And what does "when viewed as a usize" mean here?

PS:

I have started reading Programming Rust, 2nd Edition [Book] now, after the online version of the official book. Really a good book -- but it will take some time to read it, and perhaps I should wait and buy the 2024 edition when available?

Since 64 bits are enough for your usecase I would just use a u64 in Rust just like you did in C. No need to overcomplicate it.

2 Likes

Yes, that might be an option. But the C syntax with ||, &&, and << is very ugly and error prone -- well perhaps there might exists a crate like bitops in Rust for a nicer syntax?

But actually, I am still in the process of learning Rust, that is why I started this thread. I am aware that I can do most of ugly C in Rust somehow :slight_smile:

You can also derive Default manually for HashResult rather than just using the #[derive(Default)].
There are also crates which provide smarter macros which will allow you to give the field a default values.

struct BitVector;

struct HashResult {
    _foo: BitVector,
}

impl Default for HashResult {
    fn default() -> Self {
        Bar { _foo: Foo }
    }
}

Thank you for that hint, I will try that.

Actually I had to correct my remarks to bit manipulation in C, it is obviously just & and | for bitand and bitor, instead of the logical && and ||.

Additionally, I found a bitops crate, see bitops::BitOps - Rust, but that one is very restricted.

And I tried to implement a Default for BitVector, but from the compiler error messages I learned that this can be done only in the crate where BitVector is defined. I tried something like

use bitvector::*;

type BV = BitVector;

impl Default for BV {
    fn default() -> Self { BitVector::new(30) }
}

#[derive(Default)]
struct O {
  b: BV
}

fn main(){

  let mut o: O = Default::default();

  // a bitvector contains 30 elements
  let mut bitvec = BitVector::new(30);
  // add 10 elements
  for i in 0 .. 10 { bitvec.insert(i); }
  // you can use Iterator to iter over all the elements
  assert_eq!(bitvec.iter().collect::<Vec<_>>(), vec![0,1,2,3,4,5,6,7,8,9]);

  let mut bitvec2 = BitVector::new(30);
  for i in 5 .. 15 { bitvec2.insert(i); }

  // set union
  assert_eq!(bitvec.union(&bitvec2).iter().collect::<Vec<_>>(),
             vec![0,1,2,3,4,5,6,7,8,9,10,11,12,13,14]);

  // set intersection
  assert_eq!(bitvec.intersection(&bitvec2).iter().collect::<Vec<_>>(),
             vec![5,6,7,8,9]);

  // set difference
  assert_eq!(bitvec.difference(&bitvec2).iter().collect::<Vec<_>>(),
             vec![0,1,2,3,4]);

  // you can also use `&`(intersection) `|`(union) and `^`(difference)
  // to do the set operations
  assert_eq!((&bitvec ^ &bitvec2).iter().collect::<Vec<_>>(),
             vec![0,1,2,3,4]);
}

Yeah, I meant implement default for HashResult, or ChessSquares rather than BitVector.
In order to implement it for BitVector you will need to wrap it in a newtype like struct Bv(BitVector);.

If ChessSquares is just an alias it would need to become a newtype (just like Bv).
depending on whether you wanted HashResult::default() or ChessSquares::default().

I updated the example with better names.

Yes, I think I understood your hint. The following code compiles and runs, so I think that is a possible solution:

use bitvector::*;

struct O {
  b: BitVector,
  a: [i32; 8],
}

impl Default for O {
    fn default() -> Self {
     let result = Self {a: Default::default(), b: BitVector::new(64)};
     return result;
    }
}

fn main(){
  let mut o: O = Default::default();
  println!("{}", o.a[0]);
}

I got it working with BitVector, and my feeling is, that the code has become a few percent faster compared to BitSet. But my impression is, that BitVector uses not a plain stack-allocated memory block to store the bits, but a Rust Vector with its heap-allocated data buffer. So maybe one of the other alternatives from crates.io: Rust Package Registry might be even a better alternative.

What you want is presumably a bitset with a statically determined size, eg. something like Bitset<64> or Bitset<u64>. I’m a bit flummoxed that such a thing does not actually seem to exist on crates.io, at least none with significant downloads. The fixedbitset crate despite the name is in fact backed by a Vec and not fixed. enumset is great if you have an enum, but not for generic unnamed bits. There’s smallbitset which does provide a Set64 among others, so you might want to check it out.

1 Like

The bitvec crate should offer a BitArray type which requires a compile time known size.

5 Likes

Regarding

It should also be noted that the amount of storage necessary for holding a set of objects is proportional to the maximum of the objects when viewed as a usize.

I think "amount of storage necessary" refers to these lines where, for example, a set.insert(1000) will cause a resize of the underlying vector to a number of blocks large enough to hold at least elements 0..1000. This would be viewable as a usize because vec.len() returns a usize.

What are the "objects"? Can the BitSet contain other data types than unsigned integers?

In theory, BitSet could be backed by any type that implements the trait BitBlock, although I can't think of a use case for defining a custom struct like this. I noticed BitBlock isn't impl for u128 which was interesting.

1 Like

Interesting project. I also am working (learning Rust) on my chessproject, which was first (long time ago) written in Delphi, a while back in C# (which is reasonable but unusable for real speed) and now in Rust.
I just would think that you write a trait for u64 boiling down to intrinsitcs. That is what I have done at least...
(edit): oops, looking again i see that this is not what you were talking about :slight_smile:

1 Like

Thank you all very much for the detailed explanations.

[EDIT]

For the case, that somebody should wonder how crate BitVec can be used with a static 64-bit set, here is a short example:

use bitvec::prelude::*;

fn main() {
    let mut c: BitArr!(for 64, in u64) = bitarr![u64, Lsb0; 0; 64];
    let mut d = bitarr![u64, Lsb0; 0; 64];
    println!("{}", c.len()); // 64
    println!("{}", c[0]); //  false
    c.set(0, true);
    println!("{}", c[0]); // true
    c.set(0, false);
    println!("{}", c[0]); // false
    c.set(0, true);
    let h = 1;
    d.set(h, true);
    let x = c | d; // bitor
    println!("{} {} {} {}", x[0], x[1], x[2], x.any());
}

A short test in my chess engine actually seems to result in slower performance compared to the crates BitSet and BitVector, at least when compiled in debug mode. For now it is enough for me that I have learned how I might use the BitVec crate -- later I might further investigate the actual performance or generated assembly code.

Debug mode is always going to have overhead for generic implementations of "simple" operations, because there's more functions involved. Comparing opt-level=0 performance is not a particularly helpful exercise. With --release, bitvec optimizes well (for usage without split_at_mut, at least).

4 Likes

Actually I am still not sure if I have used BitVec just wrong. I will try to do a performance test in release mode soon. For debug node, and testing within my chess engine, time for generating the first move was about 1.5 sec with BitSet and went to 2.0 sec with BitVec. Now, with my own 64-bit set implementation time is 1.3 sec only. It would be interesting if the generic BitVec has poor performance in debug mode, and performs very well in release mode. Actually, my feeling was, that a syntax like "c.set(0, true);" with the boolean parameter can never be as fast as a pure c.set(bitnum) -- but maybe the compiler can fully optimize this in release mode when inlining.

It's very much expected that abstractions have a cost in debug mode, when the compiler optimizes almost nothing (to improve build times)

1 Like

Maybe I will check this out too. Never had a look at bitvec.
Very curious how popcount, lsb, bit-iterator etc. perform (if they are there) compared to handwritten code.