I have to understand that the internal closure moves the mutable map into itself and thereby itself becomes mutable?
I don't get how anybody would get there and I've read no shortage of Rust books. A lot of what's going on here feels like "now draw the rest of the owl".
You can write the thread-safe Fn version much more simply, albeit with a few performance hits compared to @steffahn's version:
fn memoize<A, B>(f: impl Fn(A) -> B) -> impl Fn(A) -> B
where
A: Eq + Hash + Clone,
B: Clone,
{
let map = Mutex::new(HashMap::<A,Arc<OnceLock<B>>>::new());
move |a| {
let entry = map.lock().unwrap().entry(a.clone()).or_default().clone();
entry.get_or_init(|| f(a)).clone()
}
}
As far as my thought process to write this version:
I started with the non-thread-safe version.
I added a Mutex around the map to get thread-safe interior mutability
I observed that the code deadlocks, so I needed to figure out how to call f without holding the lock
To do that, I stored Arc<...> in the map instead, so that I could get a reference to the relevant cache slot that escapes the mutex guard
This created a need for another thread-safe interior mutability solution to protect the modification of the cache slot, now that the Mutex wasn't in play.
Because the need was write-once, read-many, OnceLock was the obvious choice.
Overall, this is very similar to what @steffahn came up with, but with the downside that readers in multiple threads are all serialized (even when no write is actually necessary).
What do you mean, performance hits? In the playground, all of the numbers in your version seem better
(Really, it seems the RwLock would probably only ever can have a benefit in case of many threads reading (already memoized) function results in parallell. Though there is also the non-Arc-cloning fast path, one might want to test if that alone, combined with Mutex, might still be any better.)
I'll admit, I didn't actually benchmark this at all. But yours has a few potential benefits, particularly if the usage pattern is weighted heavily towards reads, which is a reasonable assumption for a caching algorithm:
a.clone() is only called in the write path, whereas my version calls it unconditionally
Similarly, yours has a fast path that avoids cloning the Arcs unnecessarily
Readers of pre-cached results can work completely in parallel with each other
That all has to be balanced against the additional cost RwLock imposes over the simpler Mutex, as well as the maintenance burden of a more complicated algorithm— I can well believe that the best version is heavily workload-dependent.