Allocation tuning for my BtreeMap

Latest development for my BTreeMap is "allocation tuning", by which I mean tuning of the policy for re-sizing or re-allocating the underlying vecs when inserting into the map. I have made it so the tuning can be dynamically adjusted.

So I have a trait like this:

with a method

fn set_seq(&mut self)
Set allocation mode to be optimal for sequential (ordered) inserts.

In practical terms this can be used in the serde deserialisation code to "go faster", like this:

        let mut map = BTreeMap::new();
        let mut tuning = map.get_tuning();
        tuning.set_seq();
        let save = map.set_tuning(tuning);
        ...

( from here ).

Overall this means my serialise/deserialise test runs about 2 x faster than before, and about 9 times faster than std::collections::BTreeMap ( tests are here ) which is quite pleasing.

In the process I also learnt how default generic parameters work, which was a bit mind-blowing frankly, very clever stuff!

Are you testing with random sequences or other patterns? I did only see sorted insertions of u32 on a first glance. For relevant performance characteristics you probably want to include unsorted access with non-trivial comparison operations like Box.

1 Like

I don't do any kind of random sequences, but I think this likely isn't relevant to the serialisation test, which only measures serialising and de-serialising, which I don't think would be significantly affected by how the original data was inserted.

I agree I probably ought to devise some kind of random insertion pattern tests for insertion, but I haven't got around to that yet. I do have tests for forward and reverse insertion ( it makes a big difference ).

I should say, my deserialisation code uses CursorMut insertion, which is much faster than standard insertion (if the data is sorted, which of course it is in the test). Serde cannot do that as std CursorMut is currently only available on nightly. This accounts mostly for why it is 9 times faster ( although there are some other factors, my forward insertion test is faster that std for other reasons, the reverse insertion not so much ). Std does (relatively) well on reverse insertion as it uses sequential rather than binary search, which always terminates instantly for reverse insertion.

Today I did try some randomisation in my "Get" tests. It didn't make any difference.

What did make some difference was using Strings as the key type rather than integers, the standard implementation was a bit slower (nothing very dramatic, maybe a 30% swing), likely because using linear search does slightly more comparisons, so if the comparison is more expensive, that counts against linear search (and the benefits of linear search probably go away due to the extra indirection involved in comparing strings ).

Edit: the random number generation cost (and "String generation" cost) was rather significant ( I wasn't pre-generating the random numbers or the test Strings ), without the random numbers and with pre-generated Strings, the differences between exp (my implementation) and std( the standard library) are more significant for strings.

Benchmark:

fn bench_rget(c: &mut Criterion) {
    //use rand::Rng;
    //let mut rng = rand::thread_rng();
    let mut group = c.benchmark_group("Rget");
    for n in [10, 20, 50, 100, 200, 500, 1000].iter() {
        let n = *n;
        let mut s = Vec::new(); for i in 0..n { s.push(i.to_string()); }
        group.bench_function(BenchmarkId::new("Exp", n), |b| {                       
            let mut map = pstd::collections::BTreeMap::new();
            for i in 0..n {
                map.insert(i.to_string(), i.to_string());
            }
            b.iter(|| {
                for i in 0..n {   
                    /* let ri = rng.gen::<usize>() % n; */                 
                    assert!(map.get(&s[i]).unwrap() == &s[i]);
                }
            })
        });
        group.bench_function(BenchmarkId::new("Std", n), |b| {
            let mut map = std::collections::BTreeMap::new();
            for i in 0..n {
                map.insert(i.to_string(), i.to_string());
            }
            b.iter(|| {
                for i in 0..n {
                    /* let ri = rng.gen::<usize>() % n; */
                    assert!(map.get(&s[i]).unwrap() == &s[i]);
                }
            })
        });
    }
    group.finish();
}

Results:

C:\Users\ano31\Rust\pstd>cargo bench Rget
   Compiling pstd v0.1.9 (C:\Users\ano31\Rust\pstd)
    Finished `bench` profile [optimized] target(s) in 8.63s
     Running unittests src\lib.rs (target\release\deps\pstd-220802aa36492c0d.exe)

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 103 filtered out; finished in 0.00s

     Running benches\crit_bench.rs (target\release\deps\crit_bench-fe180871c2ed95a4.exe)
Gnuplot not found, using plotters backend
Rget/Exp/10             time:   [206.41 ns 218.77 ns 233.37 ns]
                        change: [-74.937% -73.291% -71.491%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 16 outliers among 100 measurements (16.00%)
  8 (8.00%) high mild
  8 (8.00%) high severe
Rget/Std/10             time:   [303.42 ns 317.94 ns 335.98 ns]
                        change: [-63.301% -61.848% -59.986%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 12 outliers among 100 measurements (12.00%)
  3 (3.00%) high mild
  9 (9.00%) high severe
Rget/Exp/20             time:   [505.33 ns 524.75 ns 549.85 ns]
                        change: [-71.073% -69.189% -67.095%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 13 outliers among 100 measurements (13.00%)
  3 (3.00%) high mild
  10 (10.00%) high severe
Rget/Std/20             time:   [628.97 ns 684.93 ns 748.94 ns]
                        change: [-65.862% -63.148% -59.633%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 16 outliers among 100 measurements (16.00%)
  5 (5.00%) high mild
  11 (11.00%) high severe
Rget/Exp/50             time:   [1.6079 µs 1.7319 µs 1.8928 µs]
                        change: [-68.576% -66.531% -64.460%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 16 outliers among 100 measurements (16.00%)
  6 (6.00%) high mild
  10 (10.00%) high severe
Rget/Std/50             time:   [2.3330 µs 2.5200 µs 2.7597 µs]
                        change: [-55.524% -52.908% -49.851%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 12 outliers among 100 measurements (12.00%)
  7 (7.00%) high mild
  5 (5.00%) high severe
Rget/Exp/100            time:   [3.9287 µs 4.1865 µs 4.4995 µs]
                        change: [-63.822% -61.548% -58.836%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 19 outliers among 100 measurements (19.00%)
  8 (8.00%) high mild
  11 (11.00%) high severe
Rget/Std/100            time:   [5.2190 µs 5.6029 µs 6.0511 µs]
                        change: [-55.837% -53.159% -49.993%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 12 outliers among 100 measurements (12.00%)
  3 (3.00%) high mild
  9 (9.00%) high severe
Rget/Exp/200            time:   [10.845 µs 11.622 µs 12.538 µs]
                        change: [-61.414% -58.747% -56.046%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 13 outliers among 100 measurements (13.00%)
  2 (2.00%) high mild
  11 (11.00%) high severe
Rget/Std/200            time:   [14.047 µs 15.177 µs 16.509 µs]
                        change: [-49.013% -45.383% -41.508%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 20 outliers among 100 measurements (20.00%)
  7 (7.00%) high mild
  13 (13.00%) high severe
Rget/Exp/500            time:   [40.758 µs 43.188 µs 46.001 µs]
                        change: [-46.554% -44.058% -41.140%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 12 outliers among 100 measurements (12.00%)
  8 (8.00%) high mild
  4 (4.00%) high severe
Rget/Std/500            time:   [57.927 µs 62.423 µs 67.314 µs]
                        change: [-26.238% -21.209% -16.044%] (p = 0.00 < 0.05)
                        Performance has improved.
Rget/Exp/1000           time:   [95.247 µs 100.17 µs 105.98 µs]
                        change: [-45.232% -42.683% -40.184%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 15 outliers among 100 measurements (15.00%)
  5 (5.00%) high mild
  10 (10.00%) high severe
Rget/Std/1000           time:   [123.90 µs 133.28 µs 143.89 µs]
                        change: [-38.459% -34.479% -30.115%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 18 outliers among 100 measurements (18.00%)
  10 (10.00%) high mild
  8 (8.00%) high severe


C:\Users\ano31\Rust\pstd>