Hi all, this is my first post here, I'll try to be as concise (but detailed) as possible without giving too many impl details unless necessary.
I'm trying to implement a data structure in my library that works with traits (trait Entity {}
). It allows to iterate over all the entities, where each entity has a location in a 2-dimensional grid, and it allows fast retrieval of each entity given its position in the grid as well as retrieval of the entity neighbors (mutably and immutably).
The main data structure could be split into 2 main components:
-
The data structure that stores the (strong references to the) entities, currently represented as
HashMap<Id, Rc<RefCell<dyn Entity>>>
-
The data structure that stores the (weak references to the) entities (the grid), currently represented as:
Vec<rc::Weak<RefCell<dyn Entity>>>
(each index is mapped to the 2-dimensional location in the grid)
The data structures are queried in a single threaded engine: I can iterate over each entity, mutate it and query as well as mutate its neighboors. This works well, and the engine implementation guarantees that the same entity will never be borrowed mutably at the same time (and being run in a single thread helps), without the need to use any unsafe
.
Now, this "problem" of having multiple entities located in a 2-dimensional grid parallelized quite well at least in theory, and Rust always had strong guarantees over parallelism and safety. So I decided to refactor the engine to be able to iterate over each entity in parallel, and guarantee that no deadlock would occur (when locking 2 different entities that have each other as neighbor for example).
It works, and the refactoring was relatively easy: it was a matter of adding the necessary Send + Sync
bounds and replace Rc<RefCell>
with the equivalent Arc<RwLock>
for multi-threaded environment.
The only issue is that it is incredibly slow when compared to the single thread version. I profiled the application thanks to flamegraph, and I came to the conclusion that it was due to 2 main factors:
- Number of entities, the benefits of running the engine with multiple threads show only if the number of entities is considerable, especially because each entity task is not very resource consuming, but this is something that may vary depending on the users of my library, therefore with an high number of entities and a more demanding task (both defined by the user of the library) the benefits start to become significant.
-
Arc
andRwLock
overhead is significant, the difference between usingRc<RefCell>
andArc<RwLock>
even when running on a single thread is order of magnitudes slower.
I was quite disappointed, and I can't think to another data structure that would allow me to not incur in such overhead, especially because I should be able to guarantee in the engine implementation that there would never be the need to lock any entity or wrap it in an Arc
by splitting the grid into N + 1 threads and processing each entity and its neighbors sequentially in its own specific thread (where entities that share boundaries are processes in the last 'special' thread last).
Finally, the only solution that came to my mind would be to refactor the data structures to store the strong references as HashMap<Id, Box<dyn Entity>>
and the weak references as Vec<*mut dyn Entity>
, which means making use of raw pointers and lots of unsafe
.
This works (if the implementation of the engine is sound of course), there are no locks, no synchronization overhead due to the Arc
, but it is clearly not ideal: the implementation of the engine may change and with it its guarantees, it relies much more of a sound usage of the library by the users, bugs will be easier to appear, and everything else related to why unsafe
should be avoided.
Is there a better alternative that I might have missed, that would allow not only the user of the library, but the library itself to be as free as possible of unsafe
while being able to achieve high performance?