I'm trying to implement an in memory service that has to store millions of items (pointers to ARCs). The naive data structure to represent this is the following:
HashMap1(key1, Hashmap2(key2, BTree(entries)))
Hashmap1 contains around 1M keys
Hashmap2 contains around 20M keys in total (and not per each key1)
Btree contains around 10 entries per each key2
the data is hold for a few minutes in memory (say 5-10mins) while there are multiple tasks writing and reading to it (write heavy).
Now, I've tried to use DashMap for the 2 hashmaps and Btree (since I lock on Hashmap2 entry to insert items in the tree) but I realized that the system uses around 10-15gb memory.
Note in this test the outer Dashmap was right sized with an initial capacity of 1M items.
I tried to switch to switch to crossbeam Skipmap (for Hashmap1 and 2) and Skipset for the Btree thinking that a Skipmap would be more memory efficient but I didn't get better results.
I know I can fit the same data in around 1.5gb if I use Btrees for the HashMap1 and 2 removing any concurrency. So I'm very surprised about the memory overhead. Am I just missing something obvious?
Is there any concurrent data structure I can use that is memory efficient? I'm willing to give up the O(1) hashmap access
Try DashMap (or some other concurrent map) for the outer one and a mutex/rwlock-wrapped regular hashmap for the inner. Unless your dataset is extremely biased distributing the locking over so many keys might provide enough concurrency.
Also, if the innermost thing only takes 10 entries each then a btree probably is overkill, sorted vec and binary search might do (or a VecSet abstraction). If there's a hard upper bound you could even use an array.
rustc internally uses hash-sharded maps for concurrency, but I don't know if there's an established crate for that.
Thanks for the answers!
Would you mind to expand on why RWLock on a regular Hashmap is better than Dashmap? Are you thinking I will save the lock shards?
If so I've done an experiment on the outer dashmap going from the standard shard (think it's 64) to 1024 and I didn't notice any difference in size..
On the BTree I agree however are you thinking to insert in the vec and sort it for each insert? or is there a sorted array structure I'm missing?
As long as you keep it sorted, you don't have to fully sort it every time. Just use binary_search and then insert. The idea is that insert is very fast when there's only a few small elements.
I see, makes sense, I will try that.. However I didn't see a giant trace of Btreesets in my heaptrack but it was really dominated by the dashmap The maps together are more than 50% mem occupancy for some reasons.
I was hoping that a skipmap would helped me but didn't really change the numbers
You said that switching away from dashmap saved a lot of memory, so I'm assuming that dashmap is not as compact as other datastructures, perhaps its struct layout isn't as dense, perhaps it has to keep snapshots around for concurrent readers.
Ah, now I see you said you switched to Btrees, not HashMap. Then Lock<BtreeMap> would also be something worth trying.
not in std, but there might be a crate. But if not, such a wrapper should be fairly easy to write.
Unrelated to the map issues, but you can save at least one allocation per key by switching from Arc<String> to Arc<str> or &'a str. Strings have an extra layer of indirection, so each string is spending an extra 24 bytes (plus overhead from the allocator, so maybe 32 or 40 bytes really) per key, yielding maybe 504-840 MB saved.
I can definitely try to optimize this. I thought str were stored in the stack and should be know at compile time, which is not my case..
However I was hoping there was something on the actual data structure as right now their overhead is much bigger than the data itself