Atomic increment array

There will be hundreds of concurrent threads incrementing/decrementing cells in an array with 1b elem.

What is the best way to do this? What penalty do you pay for an atomic increment v. non-atomic?

You have to use atomic operations to avoid UB. Just use Relaxed if the order of operations which different threads observe does not matter.

Performance wise, the bottleneck will be communication between the cores. This will also be a problem if two cores access different memory locations that are on the same cache line known as false sharing. Therefore, it might be beneficial to align the counters to the cache line boundary.

1 Like

If you split the array into non-overlapping chunks and hand each chunk to a thread then you don't need atomics (because there's no concurrent access) and there's no performance penalty (other than what you may pay by restructuring your algorithm to use chunking).

If you let all threads work on the data concurrently then the cost highly depends on contention and memory access patterns (a complex topic. some relevant keywords: memory hierarchy, CPU cache, cache-coherency). You cannot do this without atomics or some form of locking because concurrent non-atomic access is undefined behavior.

the bottleneck will be communication between the cores...it might be beneficial to align the counters to the cache line boundary

I'm not familiar. Can you explain?

If I understood the question correctly they're talking about an array on the order of 109 elements. Cache line padding for each counter would be a giant waste of space.
Avoiding contention from tightly clustered accesses would probably be sufficient.

What is "Cache line padding"?

Avoiding contention from tightly clustered accesses would probably be sufficient.

How would I do that?

What is "Cache line padding"?

Inflating a type's size (via [alignment attributes]((via repr(align)))) to the size of a CPU's cache line to avoid the false sharing that Bruecki mentioned. Since it increases the size it's only worth it for few frequently accessed items, not for bulk data.

Perhaps the threads already are doing sufficiently independent work that they rarely access the same data and this won't be a problem. Or perhaps your data has hotspots and you'll have to think about how to not have multiple threads hammer the same data over and over.

But I don't know what's the case since you don't have provided any details about what kind of algorithms you're trying to run on the data. I'll note that you merely provided a 2-line question.

2 Likes

Ah, that makes sense. I interpreted the 1b as a byte :sweat_smile:

Ok, I'll create a question with an example use case. My real use case is more complex.

Having read your other question about pointer dereference I think it would be worthwhile for you to watch this video about how modern CPUs work and how to write performant code for them: https://www.youtube.com/watch?v=rglmJ6Xyj1c

Ok, will do!

The video doesn't cover rust. Can someone explain what words to use in rust? Just an example of many threads incrementing on an array efficiently?

Rust will still run on your CPU, so all those concepts will apply. Not sure what "words" you're referring to.

Here's an example of 1000 threads incrementing a random element of a 1000 bytes array Rust Playground
This is probably as efficient as you can get when incrementing bytes at random indexes (well, not considering cache padding, which would be feasible for 1000 bytes but not for 1 billion). If you're however following some pattern then you may be able to be more efficient, but this depends entirely on that pattern.

The code you write in rust is compiled to machine code. A lot of the techniques described in the video are applicable to any compiled programming language like rust is.

There are a fewother things that are more specific to rust like the compile time guaranteed unique access via mutable references and heavy usage of iterators. These allow the compiler to apply more optimizations. Your problem description so far was however so general that the general points from the video seemed more fitting.

2 Likes