I have a very specific situation which requires, I think, a concurrent persistent hashmap. My (simplified) use case is to count bigrams in a large, changing sets of data. I use rayon to parallelize the search, giving me a set of results per data item which are aggregated into a shared hashmap.
My current solution, wrapping a standard HashMap into an RWLock, scales very poorly. To avoid taking a write lock whenever counts are added to an existing key, I use AtomicUsize instead of usize. The interior mutability allows me to perform a relaxed fetch_add on a key’s count without acquiring the write lock of the map. While this helps a lot there still is plenty of contention on the map - especially in the beginning, when all keys are new and almost every access requires the write lock.
I have tried chashmap, but it seems to (a) give completely different results than my current approach and (b) non-deterministically crash. I have also looked into wf_hash (Non-blocking Hash Table in Rust) but it requires clonable values which, I think, means that I would have to have some kind of thread-safe arena or something similar to actually hold the AtomicUsize values and only give pointers to that to the hashmap. That arena would then again be a bottleneck as it would be shared amongst threads.
I would be very grateful for any pointers to other implementations of data structures that could be helpful in this case. I would also appreciate pointers to other data structures that could fit this use case. The general use case could be described as persistent, concurrent, sparse maps. (They are sparse in the sense that out of many possible keys only a small number are actually present at any given time.)