Data structures of strong and weak references for parallel access

Hi all, this is my first post here, I'll try to be as concise (but detailed) as possible without giving too many impl details unless necessary.

I'm trying to implement a data structure in my library that works with traits (trait Entity {}). It allows to iterate over all the entities, where each entity has a location in a 2-dimensional grid, and it allows fast retrieval of each entity given its position in the grid as well as retrieval of the entity neighbors (mutably and immutably).

The main data structure could be split into 2 main components:

  • The data structure that stores the (strong references to the) entities, currently represented as

    HashMap<Id, Rc<RefCell<dyn Entity>>>

  • The data structure that stores the (weak references to the) entities (the grid), currently represented as:

    Vec<rc::Weak<RefCell<dyn Entity>>> (each index is mapped to the 2-dimensional location in the grid)

The data structures are queried in a single threaded engine: I can iterate over each entity, mutate it and query as well as mutate its neighboors. This works well, and the engine implementation guarantees that the same entity will never be borrowed mutably at the same time (and being run in a single thread helps), without the need to use any unsafe.

Now, this "problem" of having multiple entities located in a 2-dimensional grid parallelized quite well at least in theory, and Rust always had strong guarantees over parallelism and safety. So I decided to refactor the engine to be able to iterate over each entity in parallel, and guarantee that no deadlock would occur (when locking 2 different entities that have each other as neighbor for example).

It works, and the refactoring was relatively easy: it was a matter of adding the necessary Send + Sync bounds and replace Rc<RefCell> with the equivalent Arc<RwLock> for multi-threaded environment.

The only issue is that it is incredibly slow when compared to the single thread version. I profiled the application thanks to flamegraph, and I came to the conclusion that it was due to 2 main factors:

  • Number of entities, the benefits of running the engine with multiple threads show only if the number of entities is considerable, especially because each entity task is not very resource consuming, but this is something that may vary depending on the users of my library, therefore with an high number of entities and a more demanding task (both defined by the user of the library) the benefits start to become significant.
  • Arc and RwLockoverhead is significant, the difference between using Rc<RefCell> and Arc<RwLock> even when running on a single thread is order of magnitudes slower.

I was quite disappointed, and I can't think to another data structure that would allow me to not incur in such overhead, especially because I should be able to guarantee in the engine implementation that there would never be the need to lock any entity or wrap it in an Arc by splitting the grid into N + 1 threads and processing each entity and its neighbors sequentially in its own specific thread (where entities that share boundaries are processes in the last 'special' thread last).

Finally, the only solution that came to my mind would be to refactor the data structures to store the strong references as HashMap<Id, Box<dyn Entity>> and the weak references as Vec<*mut dyn Entity>, which means making use of raw pointers and lots of unsafe.

This works (if the implementation of the engine is sound of course), there are no locks, no synchronization overhead due to the Arc, but it is clearly not ideal: the implementation of the engine may change and with it its guarantees, it relies much more of a sound usage of the library by the users, bugs will be easier to appear, and everything else related to why unsafe should be avoided.

Is there a better alternative that I might have missed, that would allow not only the user of the library, but the library itself to be as free as possible of unsafe while being able to achieve high performance?

I don’t know how it would compare in performance, but you could consider a kind of double-buffered approach, where all the updates are calculated and stored first (via shared references) and then the updates are applied after (no interaction between objects).

You can then have batches of entities that are always updated together via a single lock, with something like a Channel for each batch to store the pending updates.

Obviously the storage requirements (and corresponding heap allocations) might end up overwhelming any benefits you’d get from reduced lock contention; profiling would be essential.

A safe way is having the parallel processing only read (no mutation.) Writing performed as a less expensive step from gathering the results. Doing so is very dependent on problem so possibly not suitable to your case.

Note synchronization is needed when even dealing with writing to raw pointers or you introduce buggy racey code.

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.