Concurrent Persistent Hashmap

I have a very specific situation which requires, I think, a concurrent persistent hashmap. My (simplified) use case is to count bigrams in a large, changing sets of data. I use rayon to parallelize the search, giving me a set of results per data item which are aggregated into a shared hashmap.

My current solution, wrapping a standard HashMap into an RWLock, scales very poorly. To avoid taking a write lock whenever counts are added to an existing key, I use AtomicUsize instead of usize. The interior mutability allows me to perform a relaxed fetch_add on a key's count without acquiring the write lock of the map. While this helps a lot there still is plenty of contention on the map ­­- especially in the beginning, when all keys are new and almost every access requires the write lock.

I have tried chashmap, but it seems to (a) give completely different results than my current approach and (b) non-deterministically crash. I have also looked into wf_hash (Non-blocking Hash Table in Rust) but it requires clonable values which, I think, means that I would have to have some kind of thread-safe arena or something similar to actually hold the AtomicUsize values and only give pointers to that to the hashmap. That arena would then again be a bottleneck as it would be shared amongst threads.

I would be very grateful for any pointers to other implementations of data structures that could be helpful in this case. I would also appreciate pointers to other data structures that could fit this use case. The general use case could be described as persistent, concurrent, sparse maps. (They are sparse in the sense that out of many possible keys only a small number are actually present at any given time.)

1 Like

Could you avoid the contention issue entirely by having each thread fill a private HashMap, and merging these HashMaps at the end? This will cause more memory traffic at the end of the computation, but the scalability benefits are usually worth it.

3 Likes

If you count bigrams (pairs of letters), perhaps you can avoid HashMap and use a Vec<AtomicU64> instead? That is, you can map bigrams to disjoint indices directly, without a generic hash function. If you are counting something more complex then bigrams (say, k-mers in the DNA), you can generalize this approach with the help of a perfect hash function.

Thanks for the suggestions!

I haven't had a chance to benchmark the code with that setup. It is definitely worth a try, I think. I wouldn't be surprised if it slows the program down, though. Even with the lock contention, the CPU utilization isn't too bad and I am worried that these sequential bits will introduce too much idle time. Maybe I can make the merge step slightly more parallel by recursively merging two of the remaining hashmaps.

The space of bigrams is very, very big. And if I use a hash function I would basically be re-implementing a hashmap. But maybe this is actually the right thing to do, seeing how all the existing maps don't quite match my use case.

1 Like

But the set of "actually present" bigrams is bounded by the size of dataset. One possible hack would be to statically precompute a set of most common bigrams and use something like

struct Bigrams {
    // statically known bigrams, no need to resize hash map
    common: &'a HashMap<BiGram, AtomicUsize>,
    rare: RwLock<&'a mut HashMap<BiGram, AtomicUsize>>,
}

This would basically look like a rayon fold to fill a local HashMap, and then reduce to merge them together. It still may feel serialized as the final reductions deal with larger maps though.

1 Like

The data set here is changing throughout the execution. And the the frequency of bigrams will vary wildly between iterations.[quote="cuviper, post:6, topic:10463"]
This would basically look like a rayon fold to fill a local HashMap, and then reduce to merge them together. It still may feel serialized as the final reductions deal with larger maps though.
[/quote]

Oh, I didn't know that existed. That would certainly be easier than coding it myself. I will report back when I did more benchmarks. I guess there might be a way to calculate a good capacity estimate to at least avoid too unnecessary allocations while merging.

1 Like