Thread synchronization performance issues

I think I start to understand why you needed callbacks and Arcs. You wanted to be able to run the callback asynchronously at some point after having returned from a Manager::get_chunk method, am I correct? I see that this is what the ThreadManager does.

If so, returning a simple Result from get_chunk won't be enough, as it would break the asynchronicity of your design by requiring that the result be available at the time where get_chunk terminates. You would need instead something which can manage an asynchronous result, such as a Future. The futures-cpupool crate provides a way to run tasks on a thread pool and return the futures of the results, if you are interested in that interface design direction.

Sticking with callbacks for now, you could of course have your callback take a reference to a slice of data:

F: FnOnce(chunk::ChunkResult<&[u8]>) + Send + 'static

However, you must then ensure that your reference is valid at the time where you invoke the callback. This is what makes this design a bit more difficult to use in a thread-safe world, as you obviously cannot return a reference to lock-protected data without holding the lock, so either...

  • You run the callback while holding the lock, and risk to hitting lock contention issues if the callback is long-running.
  • You leverage the fact that your chunks are immutable after creation, and can thus safely be accessed concurrently by multiple threads, in order to keep them accessible even when the lock is not being held.

An example of the second strategy would be to store individual chunks in Arcs so that references to them can escape the lock:

chunks: Arc<RwLock<HashMap<String, Arc<Vec<u8>>>>>

With this design, you could take a reference to an individual chunk...

        let chunk = match self.chunks.read().unwrap().get(&config.id) {
            // Cloning an Arc<Vec<u8>>, not the Vec itself!
            Some(chunk) => Some(chunk.clone()),
            None => None
        };

...and then use that reference as you see fit, for example to feed a slice into your callback

callback(Ok(&chunk[offset as usize..(offset + size) as usize]));

The trade-off here would be additional overhead when accessing small chunks, as you need to go through thread-safe reference counting for each individual chunk. If you expect your data chunks to be small, it may be faster to copy them, as reference counting only makes sense when the data is large and copying it is a performance burden.

2 Likes