Not the best terminology, but I meant just calling from main the handwritten timing function that I originally posted. To be clear, my latest version of this is below, with a few alternatives commented out. On my computer, the "fast mode" runs the same speed as the "slow mode" in this timing function. Also, I originally had a type conversion using `as usize`

, which had a significant impact on performance once I removed it.

The comments from @Hyeonu about bound checking being optimized out are consistent with my experience using the handwritten timing function, in particular getting the same performance when explicitly using `get_unchecked`

. I'm not sure why the difference shows up in criterion but not in my timing function.

```
pub fn benchmark_bitmask(iterations: u64) {
let zero: [u64; 64] = [0; 64];
let val: u64 = 123456;
let mut y = 0;
let start_time = Instant::now();
for i in 0..iterations {
if (zero[(y % 64)] & val) == 0 {
y += 1;
}
// "Fast" version, but performance is same
// if (zero[y] & val) == 0 {
// y += 1;
// y %= 64;
// }
// Explicitly use get_unchecked. Same performance.
// if unsafe{ *zero.get_unchecked(y % 64) == 0 } {
// y += 1;
// }
// Original version with type conversion. Almost twice as slow.
// if (zero[(y % 64) as usize] & val) == 0 {
// y += 1;
// }
}
let duration = start_time.elapsed();
println!("y: {}", y);
println!("benchmark: {} iterations in {}ms ({:.7}ns per iter)", iterations, duration.as_millis(), (duration.as_secs_f32() / (iterations as f32)) * 1e9);
}
```