Implementing thread-safe vec; where can I find a list of support CPU arch features?

In the game I’m working on, I’d like to run updates in stages: run several systems concurrently, with each publishing a message to an event/effect channel, then run each system again with just read access to the channel, and finally clearing the channel.

So I’d like something like Vec, in that I want to be able to push, iterate, and clear. In the first stage of the game update logic, multiple threads must push to the Vec-like type. Next, the vec is read from start to finish on each of the game system threads. Finally, the vec is cleared at the end of the game update stage.

I think that all I need is a type with an internal array and a u8 current_index, and an unsafe method atomic_push(&self, event: Event), which atomically-increments the index, then writes the event into the newly-claimed slot. clear(&self) would then just erase all slots from 0 to current_index, then reset current_index to 0.

I think this is correct. I feel comfortable writing this. I even found the perfect method in unstable https://doc.rust-lang.org/stable/std/sync/atomic/struct.AtomicU32.html#method.fetch_add. I’m finding it difficult to locate concise docs on which CPUs would support this operation.

Crossbeam’s MPMC might be an alternative, but has the downside of popping messages off it seems. Multiple consumers need to read through the entire queue. Then a 3rd stage of the game update clears the whole queue.

There are other ways to accomplish this. This might be chasing performance in a way that’s not important, but I do expect this to be a hot area of code, and I’d like to learn about some of this plumbing anyway.

2 Likes

Is the array size known in advance? This procedure will fail if the Vec needs to be expanded, because then the first thread will need to expand it, and the second thread tries to use it before it is reallocated. If the size is known, I expect this to work.

If the size isn't known at start, I would let every thread create it's own vector. Then, for the second stage, you make a new vector containing all other vectors and use a flatten iterator to loop over it. To use it on all threads (read-only), you can use an Arc. As soon as all threads are ready, Arc will make sure it is removed.

1 Like

Cool, yes that’s what I was thinking about the known-size versus variable size conditions. For my case, I think I could make it fixed size.

The separate vectors being concatenated at the end is probably the most practical solution though. Less mucking about with unsafety, more guaranteed platform compatibility.

I could also try both to see the relative performance. I suspect, but haven’t verified, the single vec-like data structure would be faster by virtue of locality.

Still a newbie with threading and locality kinds of concerns. Thanks.

In a fixed case, I would use your push_atomic function containing the unsafe code. I would use AtomicUsize over AtomicU32, which is stable (and vector indexes need usize anyway).

2 Likes