Some test results for my BTreeMap

So I did some simple Cursor tests, insert 10,000 records and check they are correct either by removing them or by just calling next(). I repeat that 1000 times . The tests are here.

Results:

test mytests::std_cursor_insert_test ... ok .. finished in 0.43s
test mytests::std_cursor_remove_test ... ok .. finished in 0.81s
test mytests::exp_cursor_insert_test ... ok ... finished in 0.10s
test mytests::exp_cursor_remove_test ... ok .. finished in 0.20s

Here exp is my code, std is the standard library (std::collections::BTreeMap). So exp is about 4 times faster on these tests.

As far as I can see most of the difference because I am using a "bigger B" in BTreeMap, which means fewer allocations. I think the standard Library is using 6, I am using 20 for leaf nodes, and 30 for non-leaf nodes. What I am wondering is why the standard library is using a relatively small number, are there any advantages, given it appears to run slower? [ I am running on windows using mimalloc for testing, processor is Intel(R) Core(TM) i5-7200U CPU @ 2.50GHz 2.71 GHz ].

Other tests show a similar pattern, although the difference is not quite so great, eg:

test mytests::test_exp_into_iter ... ok .. finished in 0.57s
test mytests::test_std_into_iter ... ok .. finished in 1.02s

It could be that on more "random" access, the smaller B value works better. Perhaps I should do some tests to check if that is the case.

I'm no b-tree expert, but I can imagine it's a tradeoff between what width and depth give. Random access, as you mentioned, may benefit differently.

Regarding benchmarks, have you had a look at Criterion? It has a black box for preventing things from being optimized into constants, and should be more reliable than timing unit tests. GitHub - bheisler/criterion.rs: Statistics-driven benchmarking library for Rust

3 Likes

FYI, the standard library now has std::hint::black_box(). (It's still good to use Criterion or another benchmarking library, to perform measurements with lower noise, but this specific thing is no longer a reason to take a dependency on Criterion.)

2 Likes

Nice, I missed or forgot that it's available now. But yes, my main point was rather that Criterion is generally helpful with setting up the benchmarks and taking measurements.

To really understand it I suggest changing your implementation temporarily to use the same B as std, and run the tests. Then you can see if that's the only reason for the performance difference.

Thanks. Regarding Criterion, the compiler cannot remove the asserts, I think the results are real enough.

When I said it was fewer allocations, I think it may be not exactly that, rather it is working with bigger chunks of data or something, so "expensive" visits to a parent node are not so frequent, although it is very hard to know what exactly is going on!

I discovered my code didn't work when removing in reverse order, but after fixing that, it is significantly slower, as reverse remove requires a visit to the parent each time, it is relatively awkward ( even though the shifting within the leaf nodes is eliminated ). It is still slightly faster than the standard library though, but not by nearly so much, 0.36s vs 0.56s.

Yes, I already did that, it was the basis of my original comment. Using the same B, results tend to be much more similar, although there are still some differences (my code tends to be slightly faster, but like maybe only 10% or so, although it varies from test to test, my cursor remove forward is slower, but still 2x faster than std).

The std implementation stores parent links ( and indexes ), which is simpler in some ways, but will use a bit more memory, a small B value also increases memory overhead slightly ( I haven't calculated exactly, would be interesting to check that ).

1 Like

I'm not questioning your results, I just wanted to mention it as a possible improvement. It helps with iteration, measurements, outliers, comparing to previous runs, and so on. If it was only about the black boxing, you might as well use the one in std.

1 Like

Something nice that Criterion gives you is the ability to parameterise tests, have the results displayed graphically or directly compare two functions.

It also runs the benchmark a bunch of times, recording each run, which lets you see the timing distribution instead of just an average.

For example:

image

Here is an example benchmark report: Fibonacci2 Summary - Criterion.rs

You can find out more by flicking through the documentation:

5 Likes

I just found some earlier discussion here:

Since we now do have generic const parameters, it's probably worth a stab. I'm surprised python found such large sizes the sweet spot: the theory is that the main value of BTrees is optimizing L1/2 cache usage by fitting the entire node in a cache line, so it seems they're mostly just getting the value from minimizing comparisons: a lack of inlining making the comparison relatively expensive?

2 Likes

The effects of cache lines are important of course, but also you want to be doing a decent amount of work when you reach a node, I think!

I have just made a version which has a const generic B ( due to const generic restrictions it is actually the capacity rather than half it as usually defined ). I also made "compatibility type aliases" so it can still be used as a "simple plug-in replacement" for std::collections::BTreeMap with a default value for B.

I think the answer is actually right in the documentation ( which I may have read earlier but forgot about ).

Currently, our implementation simply performs naive linear search. This provides excellent performance on small nodes of elements which are cheap to compare. However in the future we would like to further explore choosing the optimal search strategy based on the choice of B, and possibly other factors. Using linear search, searching for a random element is expected to take B * log(n) comparisons, which is generally worse than a BST. In practice, however, performance is excellent.

So the small B (=6) the std implementation currently uses seems to be a consequence of the decision to use linear search. I am now thinking about whether to try storing Keys separately from Values to try and improve caching. It might make search quicker, at the expense of making splitting nodes (and other operations) a little more complicated.