I have a test that clones a BTreeMap which looks like this:
const REP: usize = if cfg!(miri) { 2 } else { 100 };
const N: usize = if cfg!(miri) { 100 } else { 100000 };
#[test]
fn std_clone_test() {
let mut m = std::collections::BTreeMap::<usize, usize>::new();
let mut c = m.lower_bound_mut(Bound::Unbounded);
let n = N;
for i in 0..n {
c.insert_before(i, i).unwrap();
}
assert!(m.len() == n);
print_memory();
for _rep in 0..REP {
let cm = m.clone();
assert!(cm.len() == n);
}
}
This takes 0.24 seconds to run. However my implementation takes only 0.03 seconds, that is 8 times faster, which is a lot.
Now the node size can explain some of it, by default my implementation uses B=64 compared to the std implementation B=6, and when I change to B=6 my implementation takes 0.06 seconds to do the clone test, i.e. twice as long, but that still leaves a discrepancy of 4 times.
I have tried looking at the code for the std implementation, but wasn't able to understand it in detail yet. But anyway, can you figure out why it is so much slower?