I decided this week to look at BtreeMap::get performance for my BTreeMap, compared to the standard implementation, and see if there was any tuning I could do to make it faster. The key function is the search routine, my current code is this:
pub fn search<Q>(&self, key: &Q) -> Result<usize, usize>
where
K: Borrow<Q> + Ord,
Q: Ord + ?Sized,
{
unsafe {
let (mut i, mut j) = (0, self.len());
let p = self.p.as_ptr().cast::<K>();
let mut m = (i + j) >> 1;
while i != j {
match (*p.add(m)).borrow().cmp(key) {
Ordering::Equal => return Ok(m),
Ordering::Less => i = m + 1,
Ordering::Greater => j = m,
}
m = (i + j) >> 1;
}
Err(i)
}
}
( From btree_experiment/src/vecs.rs at main · georgebarwood/btree_experiment · GitHub )
You can see I compute m = (i + j) >> 1 in two places, for some reason this makes it faster (by about 10%), but only for "small" BTreeMaps. I have a vague idea why it might be faster, but don't really know for sure.
I have tried unrolling the loop, that was slower, and various other things that didn't work. I would appreciate any ideas I could try. I haven't managed to look at the assembly code yet, even if I did I am not sure it would tell me much, but if you can get a disassembly and post it, I would appreciate it ( or I can work on this myself ).
Here are my currect criterion benchmark results for Get:
Running benches\crit_bench.rs (target\release\deps\crit_bench-0d8ee347214130b6.exe)
Gnuplot not found, using plotters backend
Get/Exp/50 time: [316.46 ns 317.26 ns 318.17 ns]
change: [-1.1394% +0.1916% +1.4191%] (p = 0.77 > 0.05)
No change in performance detected.
Found 14 outliers among 100 measurements (14.00%)
1 (1.00%) low mild
6 (6.00%) high mild
7 (7.00%) high severe
Get/Std/50 time: [373.89 ns 376.35 ns 379.24 ns]
change: [-3.9764% -0.5858% +2.7481%] (p = 0.75 > 0.05)
No change in performance detected.
Found 6 outliers among 100 measurements (6.00%)
1 (1.00%) high mild
5 (5.00%) high severe
Get/Exp/100 time: [743.96 ns 748.37 ns 754.08 ns]
change: [-5.3205% +0.2289% +5.7422%] (p = 0.94 > 0.05)
No change in performance detected.
Found 6 outliers among 100 measurements (6.00%)
6 (6.00%) high severe
Get/Std/100 time: [897.95 ns 901.67 ns 906.53 ns]
change: [-4.9771% -1.2141% +2.8100%] (p = 0.55 > 0.05)
No change in performance detected.
Found 10 outliers among 100 measurements (10.00%)
5 (5.00%) high mild
5 (5.00%) high severe
Get/Exp/200 time: [2.1381 µs 2.1501 µs 2.1623 µs]
change: [-7.1591% -1.5093% +4.2636%] (p = 0.66 > 0.05)
No change in performance detected.
Found 3 outliers among 100 measurements (3.00%)
3 (3.00%) high severe
Get/Std/200 time: [2.0529 µs 2.0653 µs 2.0819 µs]
change: [-5.4777% -0.7770% +5.5166%] (p = 0.79 > 0.05)
No change in performance detected.
Found 15 outliers among 100 measurements (15.00%)
4 (4.00%) high mild
11 (11.00%) high severe
Get/Exp/500 time: [10.405 µs 10.471 µs 10.545 µs]
change: [-11.610% -8.1001% -4.1150%] (p = 0.00 < 0.05)
Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
2 (2.00%) high mild
5 (5.00%) high severe
Get/Std/500 time: [11.559 µs 11.596 µs 11.637 µs]
change: [-8.5353% -5.6441% -2.9301%] (p = 0.00 < 0.05)
Performance has improved.
Found 10 outliers among 100 measurements (10.00%)
1 (1.00%) low mild
5 (5.00%) high mild
4 (4.00%) high severe
Get/Exp/1000 time: [35.821 µs 35.862 µs 35.906 µs]
change: [+0.0180% +0.9955% +1.9643%] (p = 0.05 > 0.05)
No change in performance detected.
Found 9 outliers among 100 measurements (9.00%)
3 (3.00%) high mild
6 (6.00%) high severe
Get/Std/1000 time: [35.607 µs 36.024 µs 36.596 µs]
change: [+0.2618% +1.3021% +2.4252%] (p = 0.02 < 0.05)
Change within noise threshold.
Found 11 outliers among 100 measurements (11.00%)
3 (3.00%) high mild
8 (8.00%) high severe
Summary: my code is faster for small maps, for bigger maps the times are pretty close.
The bench mark code is here. It is simply making maps of different sizes (50,100,200...), then doing a get for each key and checking the result.