Per-cpu atomic variables?

crossbeam::SharededLock comes to mind as prior art of something which aims to avoid slowdown due to concurrent access to the same cache line in how it structures its locking information; here’s the source code, it just assigns a number to each thread, counting up, and then uses this number modulo the number of shards as an index … well … its approach of always taking 8 shards might be suboptimal for your purposes, but as long as the number is high enough, you can either rule out conflicts in cache-access alltogether (if you have a small number of threads) or at least make them appear much less frequently on average.

1 Like