The BitVec crate is actually very fast in release mode

As my initial tests in debug mode showed a poor performance, I head the feeling that I might be using it wrong. So I created the small test program below, which tests the performance for setting individual bits in a fixed-size 64-bit set. To avoid, that the compiler might fully remove or pre-calculate the code, I used random numbers for the bit positions. Of course, the loop itself and the random number generation took the most time, but the results are still visible. For the BitVec version, I typically get these results in debug and release mode:

# debug mode
$ time cargo run
    Finished dev [unoptimized] target(s) in 0.02s
     Running `target/debug/bitvectest`
Testing crate BitSet
c.len 64
d.len 64
true true

real	0m14.388s
user	0m14.376s
sys	0m0.012s

# release mode
$ time cargo run --release
    Finished release [optimized] target(s) in 0.01s
     Running `target/release/bitvectest`
Testing crate BitSet
c.len 64
d.len 64
true true

real	0m0.168s
user	0m0.148s
sys	0m0.020s

For the version with the custom 64-bit fixed size set I get:

# debug mode
$ time cargo run
    Finished dev [unoptimized] target(s) in 0.01s
     Running `target/debug/bitvectest`
Testing custom bitset
c.len 64
d.len 64
false

real	0m8.314s
user	0m8.306s
sys	0m0.008s

# release mode
$ time cargo run --release
    Finished release [optimized] target(s) in 0.01s
     Running `target/release/bitvectest`
Testing custom bitset
c.len 64
d.len 64
false

real	0m0.169s
user	0m0.149s
sys	0m0.020s

So for release mode, the performance is nearly identical, I even have the impression that the BitVec code runs a bit faster. But for debug mode, the generic BitVec crate seems to have a lot of overhead. I tried already to inspect the assembly code at godbolt.org, but for release mode with LTO I was not able to find the actual code segment.

use rand::Rng;
use bitvec::prelude::*;

const L: u32 = 10_000_000;

fn main() {    
    let mut c: BitArr!(for 64, in u64) = bitarr![u64, Lsb0; 0; 64];
    let mut d = bitarr![u64, Lsb0; 0; 64];
    println!("Testing crate BitSet");
    println!("c.len {}", c.len()); // 64
    println!("d.len {}", d.len()); // 64
    for _i in 0 .. L { 
        let bitnum = rand::thread_rng().gen_range(0..64);
        c.set(bitnum, true);
        d.set(bitnum / 3, true);
        c.set(bitnum / 5, true);
        d.set(bitnum / 7, true);
    }   
    println!("{} {}", c.any(), d.any());
}


/*
use rand::Rng;
struct BitSet(u64);

impl BitSet {
    fn new() -> Self {
        BitSet(0)
    }
    fn set(&mut self, index: usize) {
        self.0 |= 1 << index;
    }
    fn equals(&self, other: &BitSet) -> bool {
        self.0 == other.0
    }
}

const L: u32 = 10_000_000;

fn main() {
    let mut c: BitSet = BitSet::new();
    let mut d: BitSet = BitSet::new();
    println!("Testing custom bitset");
    println!("c.len 64");
    println!("d.len 64");
    for _i in 0 .. L { 
        let bitnum = rand::thread_rng().gen_range(0..64);
        c.set(bitnum);
        d.set(bitnum / 3);
        c.set(bitnum / 5);
        d.set(bitnum / 7);

    }   
    println!("{}", c.equals(&d));
}
*/

My machine:

Architecture:            x86_64
  CPU op-mode(s):        32-bit, 64-bit
  Address sizes:         48 bits physical, 48 bits virtual
  Byte Order:            Little Endian
CPU(s):                  16
  On-line CPU(s) list:   0-15
Vendor ID:               AuthenticAMD
  Model name:            AMD Ryzen 9 5900HX with Radeon Graphics
    CPU family:          25
    Model:               80
    Thread(s) per core:  2
    Core(s) per socket:  8
$ cat Cargo.toml 
[package]
name = "bitvectest"
version = "0.1.0"
edition = "2021"

# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html

[dependencies]
bitvec = "1"
rand = "0.8.5"
2 Likes

Yes, this is expected. cargo run being 100x slower by default is a very common footgun.

5 Likes