Some test results for my BTreeMap

geebee22 · April 17, 2024, 6:21pm

So I did some simple Cursor tests, insert 10,000 records and check they are correct either by removing them or by just calling next(). I repeat that 1000 times . The tests are here.

Results:

test mytests::std_cursor_insert_test ... ok .. finished in 0.43s
test mytests::std_cursor_remove_test ... ok .. finished in 0.81s
test mytests::exp_cursor_insert_test ... ok ... finished in 0.10s
test mytests::exp_cursor_remove_test ... ok .. finished in 0.20s

Here exp is my code, std is the standard library (std::collections::BTreeMap). So exp is about 4 times faster on these tests.

As far as I can see most of the difference because I am using a "bigger B" in BTreeMap, which means fewer allocations. I think the standard Library is using 6, I am using 20 for leaf nodes, and 30 for non-leaf nodes. What I am wondering is why the standard library is using a relatively small number, are there any advantages, given it appears to run slower? [ I am running on windows using mimalloc for testing, processor is Intel(R) Core(TM) i5-7200U CPU @ 2.50GHz 2.71 GHz ].

Other tests show a similar pattern, although the difference is not quite so great, eg:

test mytests::test_exp_into_iter ... ok .. finished in 0.57s
test mytests::test_std_into_iter ... ok .. finished in 1.02s

geebee22 · April 17, 2024, 6:42pm

It could be that on more "random" access, the smaller B value works better. Perhaps I should do some tests to check if that is the case.

ogeon · April 17, 2024, 6:49pm

I'm no b-tree expert, but I can imagine it's a tradeoff between what width and depth give. Random access, as you mentioned, may benefit differently.

Regarding benchmarks, have you had a look at Criterion? It has a black box for preventing things from being optimized into constants, and should be more reliable than timing unit tests. GitHub - bheisler/criterion.rs: Statistics-driven benchmarking library for Rust

kpreid · April 17, 2024, 7:32pm

FYI, the standard library now has std::hint::black_box(). (It's still good to use Criterion or another benchmarking library, to perform measurements with lower noise, but this specific thing is no longer a reason to take a dependency on Criterion.)

ogeon · April 17, 2024, 7:52pm

Nice, I missed or forgot that it's available now. But yes, my main point was rather that Criterion is generally helpful with setting up the benchmarks and taking measurements.

jumpnbrownweasel · April 17, 2024, 8:00pm

To really understand it I suggest changing your implementation temporarily to use the same B as std, and run the tests. Then you can see if that's the only reason for the performance difference.

geebee22 · April 17, 2024, 8:38pm

Thanks. Regarding Criterion, the compiler cannot remove the asserts, I think the results are real enough.

When I said it was fewer allocations, I think it may be not exactly that, rather it is working with bigger chunks of data or something, so "expensive" visits to a parent node are not so frequent, although it is very hard to know what exactly is going on!

I discovered my code didn't work when removing in reverse order, but after fixing that, it is significantly slower, as reverse remove requires a visit to the parent each time, it is relatively awkward ( even though the shifting within the leaf nodes is eliminated ). It is still slightly faster than the standard library though, but not by nearly so much, 0.36s vs 0.56s.

geebee22 · April 17, 2024, 8:43pm

Yes, I already did that, it was the basis of my original comment. Using the same B, results tend to be much more similar, although there are still some differences (my code tends to be slightly faster, but like maybe only 10% or so, although it varies from test to test, my cursor remove forward is slower, but still 2x faster than std).

The std implementation stores parent links ( and indexes ), which is simpler in some ways, but will use a bit more memory, a small B value also increases memory overhead slightly ( I haven't calculated exactly, would be interesting to check that ).

ogeon · April 17, 2024, 9:23pm

I'm not questioning your results, I just wanted to mention it as a possible improvement. It helps with iteration, measurements, outliers, comparing to previous runs, and so on. If it was only about the black boxing, you might as well use the one in std.

Michael-F-Bryan · April 18, 2024, 2:09am

Something nice that Criterion gives you is the ability to parameterise tests, have the results displayed graphically or directly compare two functions.

It also runs the benchmark a bunch of times, recording each run, which lets you see the timing distribution instead of just an average.

For example:

Here is an example benchmark report: Fibonacci2 Summary - Criterion.rs

You can find out more by flicking through the documentation:

geebee22 · April 19, 2024, 9:04pm

I just found some earlier discussion here:

simonbuchan · April 20, 2024, 6:51am

Since we now do have generic const parameters, it's probably worth a stab. I'm surprised python found such large sizes the sweet spot: the theory is that the main value of BTrees is optimizing L1/2 cache usage by fitting the entire node in a cache line, so it seems they're mostly just getting the value from minimizing comparisons: a lack of inlining making the comparison relatively expensive?

geebee22 · April 20, 2024, 8:02am

The effects of cache lines are important of course, but also you want to be doing a decent amount of work when you reach a node, I think!

I have just made a version which has a const generic B ( due to const generic restrictions it is actually the capacity rather than half it as usually defined ). I also made "compatibility type aliases" so it can still be used as a "simple plug-in replacement" for std::collections::BTreeMap with a default value for B.

geebee22 · April 30, 2024, 3:50pm

I think the answer is actually right in the documentation ( which I may have read earlier but forgot about ).

Currently, our implementation simply performs naive linear search. This provides excellent performance on small nodes of elements which are cheap to compare. However in the future we would like to further explore choosing the optimal search strategy based on the choice of B, and possibly other factors. Using linear search, searching for a random element is expected to take B * log(n) comparisons, which is generally worse than a BST. In practice, however, performance is excellent.

So the small B (=6) the std implementation currently uses seems to be a consequence of the decision to use linear search. I am now thinking about whether to try storing Keys separately from Values to try and improve caching. It might make search quicker, at the expense of making splitting nodes (and other operations) a little more complicated.

Topic		Replies	Views
Mystery : why would std BTreeMap be slower inserting in ascending order than reverse? help	6	272	April 11, 2024
BTreeMap memory usage help	5	289	April 28, 2024
Why is std::collections::btree_map::CursorMut::insert_after_unchecked marked unsafe? help	17	417	April 15, 2024
Removing range of elements from BTreeMap help	14	2751	April 14, 2021
Tests using BTreeMap end up in stack overflow while similar implementation works seamless help	7	667	January 12, 2023

Some test results for my BTreeMap

Related Topics