I have a particular data structure in mind, which resonates a bit with a concurrent HashMap
, yet it is not quite the same.
What I want is a map, where each first read of a key will taint that key with some identifier of the reader thread (thread-id or..). Henceforth, the first thread to taint a key is called the owner of that key. Once a key is tainted, it will never be written to by anyone other than the owner. Similarly, all consequent reads of a tainted key will also fail and only return the identifier of the owner (you could see where I might be going with this: transposition tables and task forwarding.) Only the owner will do further writes and reads, hence there is no more need for cross thread synchronisation.
What I ideally want here and I see missing is some mechanism that expresses: after the first tainting, no further operation should block. Both read and write should fail immediately. Hence, contention is sensible to exist only up until the first taint.
To the best of my knowledge, the known concurrent HashMap
implementation (chashmap, evmap) will not work great here since they will not provide the exact functionality that I need. Aside from diving too deep into these, I first wanted to try and implement my own version of this, also to learn more about Rust's fearless concurrency.
I have two ideas to investigate more:
- Using atomics
struct StateEntry<T> {
data: T,
taint: AtomicU8,
}
#[derive(Default, Debug)]
pub struct State {
backend: RefCell<HashMap<Key, StateEntry<Value>>>,
}
And to taint, each thread will try and obtain the associated entry, and do an atomic compare and swap: if the previous entry.taint
was 0
(the default, untainted), then swap it with my thread id. The rest of the operations will be done similarly and on top of this.
What I am uncertain here is if this is actually thread safe? sure, the CAS is atomic, but I am not sure what happens when two threads try and CAS into the same non-existent key. will it stay thread safe and atomic, or not? The code will look something like this:
fn try_taint(&self, key: Key, current: ThreadId) -> Result<(), ThreadId> {
let owner = self
.backend
.borrow_mut()
.entry(key)
.or_default()
.taint // what if two threads execute up until here in parallel? will one of them be allowed to go forward?
.compare_and_swap(0, current, SeqCst);
if owner == current {
Ok(())
} else {
Err(owner)
}
}
- Using RwLock only around the value, something along the lines of:
pub struct State {
backend: RefCell<HashMap<Key, RwLock<StateEntry<Value>>>>,
}
Each thread will first try and acquire a write lock, then if failure fallback to a read lock (other way around could also work). I won't expand this further for brevity, but I think all in all it suffers from the same problem. What if two keys get initiated concurrently?
I think if an already existing key gets read concurrently it is fine, as there is some lock there and only one will be able to obtain it (or both of read request --anyhow), but the case of uninitialised entries is particularly unsettling for me.
Having written this, I think I am now certain that both approaches are wrong, since neither ensures that the same keys cannot be created at the same time. As far as I know, for the same reason, within the realm of the standard library, the common way to go is to wrap the whole HashMap
inside a RwLock
or Mutex
. But this is surely too strict for my use case.
I guess my rephrased question would be what is the middle ground, if there is any? I precisely want to have locks around particular keys of the HashMap
, not the whole thing at once, given the context provided for the tainting use case.