How to design API for concurrent data structure


#1

I’m attempting to design a concurrently accessible hash table. The hash table is restricted in serveral ways, using only u32 for keys and values, and not resizable.

The current API of the table looks like this:

impl IntegerHashTable {
    pub fn new(capacity: usize) -> Result<Self, AllocErr> {
        ...
    }

    pub fn size(&self) -> usize {
        ...
    }

    pub fn is_empty(&self) -> bool {
        ...
    }

    pub fn capacity(&self) -> usize {
        ...
    }

    pub fn set(&self, key: u32, value: u32) -> Option<u32> {
        ...
    }

    pub fn get(&self, key: u32) -> Option<u32> {
        ...
    }
}

impl Drop for IntegerHashTable {
    ...
}

unsafe impl Send for IntegerHashTable {}
unsafe impl Sync for IntegerHashTable {}

My issue is twofold.

  1. set function signature should not by set(&self, ...) but instead set(&mut self). Is there a way to change this and still let the borrow checker know that multiple mutating handles are not an issue?
  2. Sharing the whole structure across threads without an Arc wrapper runs into issues with the borrow checker. I would like to add the least amount of overhead possible when updating and accessing the data structure from different threads.

Currently I have a test that looks like:

#[test]
fn multiple_thread_update() {
    const NUM_THREADS: usize = 8;

    let mut handles = Vec::new();
    let map = Arc::new(IntegerHashTable::new(128).unwrap());

    for id in 1..(NUM_THREADS + 1) {
        let map = Arc::clone(&map);
        let handle = thread::spawn(move || {
            let my_id = id as u32;

            let mut values = Vec::new();
            for key in (1..11).map(move |v| v + 10 * my_id) {
                map.set(key, my_id);
                values.push(key);
            }

            values
        });

        handles.push((id, handle))
    }

    handles.into_iter().map(|(id, h)| (id, h.join().unwrap())).for_each(|(id, thread_keys)| {
        for key in thread_keys {
            assert_eq!(map.get(key), Some(id as u32));
        }
    });
}

Is there any alternative to using an Arc?

Thanks!


#2

set function signature should not by set(&self, …) but instead set(&mut self)

Why? If you can concurrently set values, it should be &self.

The Arc is necessary because std threads only take closures with static lifetime. If you use scoped threads, you can just use borrows.


#3

Could place the Arc internally and use clone to get copies on the threads. &mut would then work.

The concurrent design internal to the structure is what Java stated out with, it typically does not lead to the best performance and often requires secondary synchronisation for other code.


#4

I agree that &self works the best. I thought it might be possible to indicate via the &mut self that it was a different type of operation than, as an example, get(&self, key: u32).


#5

So it would better to provide the data structure and then make consumers explicitly wrap it in an Arc. I agree that it would better to have the cost be explicit rather than hidden in the implementation.


#6

As a general rule it’s better the user of the structure adds the Arc Mutex/RwLock. What is best is down the the project being written. One such extreme is lock-free; in such cases the API takes back seat to implementation.