What the benefit of using `crossbeam_utils::Backoff` compared `std::hint::spin`?

I mostly interested in spinning part.

crossbeam_utils::backoff:Backoff do something like this:

let mut exp = 0;
while (fail()) {
   for i in 2.pow(exp) {
        std::hint::spin();
   }
   exp += 1;
}

while more classic way is to do

while (fail()) {
   std::hint::spin();
}

What the benefit of using exponentially more mm_pauses on every iteration, especially in case of loop condition being as simple as integer acquire load and compare?

Interesting question. I’m no expert in this at all, but I did find an interesting article[1] to read to explain a case of how exponential backoff could help. That article about spin locks doesn’t quite match all possible use-cases of this Backoff API… after all, its own documentation suggests a use-case where its just waiting for some condition to become true, not for a lock to acquire.

For the lock case its explained in the article that a concern may be that many threads trying to get the lock at the same time may result in more cache invalidation, and exponential backoff means fewer threads would have noticed the short window that the lock was released before re-acquired.

For the case of merely waiting for some value without subsequent modification, e.g. what you describe as “simple […] integer acquire load and compare” (so I’m assuming this isn’t followed immediately by a modification to said integer), I can only imagine that exponential back-off could cut down on memory access if the atomic integer being acquired has a chance of sharing a cache line with other data that is mutated regularly. Similarly for the case where there aren’t multiple threads waiting to begin with. This is only my speculation, and I’m curious if others can come up with other ideas how it might help.

Though after all, maybe there are good cases where the backoff doen’t help at all? Well, as far as the whole of crossbeam_utils is concerned, eventually it also goes over to issuing yield or even suggesting you should eventually fall back to parking the thread, but that’s a separate discussion from the idea of increasing the number of spins between each check. For that, maybe some cases don’t benefit, but perform worse with the back-off… ultimately these questions can probably only reliably & conclusively be answered by just benchmarking the results, I guess.


  1. nothing special… just the first Google result I’ve found; of course no guarantees it isn’t claiming anything wrong :man_shrugging: ↩︎

So, if my lock in its own cache line compared to data it protecting, it is less useful?

The cache line containing your lock (on the processors that are spinning to obtain it) is going to be invalidated.

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.