How do I force multithreaded mutation on an Arc<HashMap<>>?

Due to external constraints, I know there won't be read-write or write-write collisions. i.e., I know values will only be read after their writing has atomically finished, and every written key will be unique. Is it possible to force mutation on the shared hashmap even though Rust's type system is disallowing it? (from what I know, essentially Arc<> is read-only).

But writes may happen at the same time (with different keys)? That would/could still cause collisions.

UnsafeCell is the core type for mutating shared values, you could wrap your HashMap in an UnsafeCell to acquire a mutable raw pointer to the HashMap through the Arc.

You almost certainly shouldn't do that though. If you're writing to individual values in the map from multiple threads, you can wrap the HashMap's values in a Mutex, RwLock, atomics or similar and save yourself a lot of potential headaches.

5 Likes

@semicoleon
I want to prevent locking the whole map for performance reasons; current benchmarks show it's too slow for my use case.

@jbe What race conditions could happen from concurrent writing with different keys?

You can lock the individual values in the map though

Arc<HashMap<String, Mutex<String>>>

Locking isn't free, of course. But if there's no contention it shouldn't cost much (And you can probably find an alternative lock that's optimized for low contention scenarios). Then you don't have to deal with memory ordering and all of that junk.

2 Likes

The concurrent reallocation, at least (and thus essentially double-free), if two inserts happen to hit the capacity at the same time.

2 Likes

@Cerber-Ursi
I see, I suppose synchronization is necessary. If writing is exceedingly rare, would it be best to have an Arc<RwLock<HashMap<>>?

If you mostly only read and occasionally need to write, then RwLock is probably a good fit.

Generally, any reorganization (reallocation as @Cerber-Ursi said) and updating meta-data that can happen inside the HashMap, for example it's length information here.

However, if locking is a bottle-neck, you might use a specialized data structure other than std::collections::HashMap, I guess, which might be pre-allocated? But no idea if there's something like that available already.

Yes, I would probably use an RwLock instead of a Mutex, though I don't have much knowledge about performance of locks. You might also look at parking_lot, which do not implement lock poisoning and might be faster.

Maybe you can explain what sort of data is written? Perhaps there's another way to work around using a HashMap at all. Maybe write things into a linear buffer at first and sort that buffer later?

Finally, if you really are sure that no collisions happen, it's possible to use UnsafeCell, but as the name suggests, it's unsafe, and it is likely not what you need. It might be used to implement a specialized data structure though (which may be not as easy as it seems).

2 Likes

There's also the concurrent-hashmap crate that implements a concurrent hashmap. It might just be what you want.

2 Likes

To explain the data being written-- individual threads are responsible for writing a partition of the keyspace and a master thread needs to be able to read the state at any point in time. I guess to improve performance I could shard the map so individual threads own a unique one and lock there. Willing to explore that if performance bottlenecks persist.

@semicoleon @isibboi
Thanks for the suggestions and help. I benchmarked the suggestions and found Parking_lot::RwLock to be the fastest for my usage patterns (probably 1:1000 write to read ratio), only 13% slower than a lockless map.

Another possibility, since the writers don't need to read, is to have the writers not access the HashMap at all, but send each key-value pair over a channel which the master thread reads and transfers to the map, so that only the master thread is accessing the map for either reads or writes.

9 Likes

I have always wondered how much faster channels are, compared to mutexes or rw-locks. I guess it depends on the type of channel and how they are used. It looks like it's generally recommended to use channels rather than explicit locks?

In my experience, yes, when possible. Especially when the channels are zero or nearly zero cost. (I've never looked at the Rust implementation. I assume they are high quality.)

Well it depends. If you need shared state locks are generally better. But with your use case where writers don't need to access the whole state, it sounds more like mutation events. Channels are generally better than locks for events.

Generally people recommend to use the crossbeam implementation for channels in lieu of stdlib ones since they're more optimized.

Additionally, (not replying to @Coding-Badly), dashmap is another option for a concurrent HashMap which I remember getting quite a bit of praise in the rust unofficial discord server.

2 Likes