BTreeMap memory usage

My new "memory optimised" code is far from complete yet, but I thought I would do some measuring before going further. This is a cherry-picked test, but I think the results are still quite interesting. The test is insert 2,000 16-byte entries into a BTreeMap (evens then odds) and check how much memory was used.


#[test]
fn exp_mem_test()
{
    const N: usize = 51;
    const M: usize = N + 1;
    let n = 1000;
    let mut map = BTreeMap::<u64, u64, N, M>::new();
    for i in 0..n {
       map.insert(i*2, i*2);
    }
    for i in 0..n {
       map.insert(i*2+1, i*2+1);
    }
    crate::print_memory();
    println!("Required memory: {} bytes", n * 32 );
    println!("size of Leaf={}", std::mem::size_of::<Leaf<u64,u64,N>>() );
}

#[test]
fn std_mem_test()
{
    let n = 1000;
    let mut map = std::collections::BTreeMap::<u64, u64>::new();
    for i in 0..n {
       map.insert(i*2, i*2);
    }
    for i in 0..n {
       map.insert(i*2+1, i*2+1);
    }
    crate::print_memory();
}

Full code here.

Results:

Memory allocated: 72916 bytes
test lessmem::std_mem_test ... ok

Memory allocated: 37300 bytes
Required memory: 32000 bytes
size of Leaf=816
test lessmem::exp_mem_test ... ok

So std is using nearly twice as much memory for this test, which I think shows there is at least some interesting scope for saving memory here. What I don't yet understand is WHY.

I changed the value of N from 51 to 55 ( it is slightly better due to alignment considerations ), and put in some tracing that show my BTreeMap has very good usage for this test:

Memory allocated: 37252 bytes
Required memory: 32000 bytes
size of Leaf=880
size of NonLeaf=1392
clen(36)=[54, 55, 55, 55, 55, 55, 55, 55, 55, 55, 55, 55, 55, 55, 55, 55, 55, 55, 55, 55, 55, 55, 55, 55, 55, 55, 55, 55, 55, 55, 55, 55, 55, 55, 41]

So most nodes are full ( due to adding evens first then odds - without that they would be half full ).

But why does the std implementation do badly here? I guess it is optimised for the case of adding elements in order, and somehow it does (relatively) well there but badly in the evens/odds case. But why? I am also thinking about how to re-distribute to avoid bad distributions.

[ Incidentally, I think about 4K (to be precise, 4,180 bytes) of the memory allocated is not for the BTreeMap at all, but some kind of fixed overhead, maybe for println! or something... ]

You should consider getting the bytes allocated before the test code runs and subtract that from the amount after it has run. Also, ensure you're not running tests in parallel (or with any other thread running) as that might skrew results.


Another factor that should be considered in the benchmark is the size of the keys and how slow is comparing them. With small and quick to compare keys it's better to have a bigger branching factor, since the additional comparisons won't cost you much, but this might not be true for other type of keys (e.g. Strings).

2 Likes

At this point "lessmem" is an experiment to see if it is possible to use less memory. I am also experimenting with different branch factors, but not yet timing.

In principle using less memory, also storing keys separately from values, might help with search speed, at least in some cases, but that remains to be seen.

My puzzle for now is why std::collections::BTreeMap is bad at "evens then odds". I haven't tried looking at the code yet to see what it is doing. It would be useful to be able to dump the tree structure to see the node distribution, but I don't think that is possible. Alternatively I could just not worry about it, but I am curious to know what it is doing.

Oh, I am using binary search at the leaf level, so a large node size only matters for insertions, where data has to be shifted up within a node to make space. It appears that this is a very fast operation though, even for quite large node sizes. 55 is probably a bit excessive though, but the larger the node size, the less memory overhead there is - fewer pointers - ( although it is diminishing returns ).

Ok... today I realised I had not been thinking straight when I originally posted.

The std implementation typically will use twice as much memory as needed ( plus some extra overhead ), it isn't doing anything clever to minimise memory usage.

I have (likely) abandoned my new code which stores lengths in the parent, I just about got it to work, but it was overly complicated. Instead I have made my Vec implementation more sophisticated, instead of simply allocating a fixed amount, it now allocates incrementally, and trims the allocation when a split happens. This will typically produce fairly drastic memory savings (halve the memory used). I haven't yet checked if the code is slower or faster (it could be either).

I now have this:

pub(crate) struct ShortVec<T> {
    len: u16,
    alloc: u16,
    cap: u16,
    v: BasicVec<T>,
}

where cap is the maximum capacity (i.e. the vec will never be larger than this, so it will never allocate more than this ). alloc is the current capacity, which may be 0, and increases in fixed amounts ( I picked 5 for now ).

Test result showing reduced mem usage compared to std:

C:\Users\ano31\Rust\btree>cargo test exp_mem -F cap -- --nocapture
   Compiling btree_experiment v0.1.87 (C:\Users\ano31\Rust\btree)
    Finished `test` profile [unoptimized + debuginfo] target(s) in 1.80s
     Running unittests src\lib.rs (target\debug\deps\btree_experiment-013752a0c64e683e.exe)

running 1 test
Memory allocated: 330788 bytes
Required memory: 320000 bytes
test mytests::exp_mem_test ... ok

test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 26 filtered out; finished in 0.02s


C:\Users\ano31\Rust\btree>cargo test std_mem -F cap -- --nocapture
    Finished `test` profile [unoptimized + debuginfo] target(s) in 0.03s
     Running unittests src\lib.rs (target\debug\deps\btree_experiment-013752a0c64e683e.exe)

running 1 test
Memory allocated: 689524 bytes
test mytests::std_mem_test ... ok

test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 26 filtered out; finished in 0.03s


C:\Users\ano31\Rust\btree>

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.