[Review Request] Atomics usage in getrandom

#1

To optimize resources usage in getrandom crate I’ve created the following PR: rust-random/getrandom#14 (source branch).

Because we can not use Once (in theory initialization function can fail, but work on re-try), it implements a custom spin-lock, thus I hope to get some code review. I have the following particular questions:

  • Can we use Ordering::Relaxed in first atomic loads considering that in case if state is not initialzied it’s followed by Ordering::AcqRel fetch? (e.g. here)
  • Am I correct that atomic stores can not be reordered here under no circumstances? (i.e. other threads can not read uninitialized file descriptor value)
  • What is better to use thread::sleep(time::Duration::from_nanos(1)) or thread::yield_now()? (note actual sleep time will not be smaller than granularity of an underlying clock) The latter may use 100% of CPU during initialization if no other threads/processes want it, but on the other hand it “simpler” than sleep.
1 Like
#2
  • You must use “Acquire” on your read which checks if the state is initialized, because if that is the case you want to synchronize with said state. Using “Relaxed” means that you may not get a fully initialized state, which is unacceptable. However, if you’re afraid of the overhead of the Acquire ordering, you can (after benchmarking) do a Relaxed load that is followed by an Acquire fence only if some extra data has been initialized and you need to read it.
  • For the snippet that you link to, SeqCst is not necessary. A Relaxed store followed by a Release store is sufficient because it already guarantees that anyone who reads STATE_USE_FD from RNG_STATE with Acquire ordering will also get an up-to-date view of the RNG_FD data.
  • thread::yield_now() is generally preferred in this kind of code, but can actually be combined with sleep. A typical mutex implementation will use some sort of bounded exponential backoff algorithm to avoid hammering the atomic in the loop (which is bad for cache performance) and wasting CPU time.

Any particular reason why you cannot use existing synchronization primitives from crates.io, e.g. parking_lot?

1 Like
#3

Thank you!

But use_fd function uses “Acquire” load afterwards to load file descriptor value, so shouldn’t it mean that state will be always fully initialized? Also here “initialization complete” flag and file descriptor are integrated into one atomic, so relaxed load should be correct there, right?

So assuming that initialization will be performed only once per application execution and initialization time is assumed relatively small, using thread::yield_now() should be fine, right?

I thought using parking_lot will be an overkill considering the problem.

#4

From a memory reordering PoV, Acquire load means “load/stores after this point cannot be reordered before this load”. And in the linux_android case, you definitely don’t want the load from RNG_FD (whose ordering is not relevant to this specific case) to be reordered before the load from RNG_STATE

This is another implementation which I’ve not had a close look at yet. But if you have no other shared writes to synchronize besides the value of the atomic itself, yes, Relaxed is enough.

Yup.

Sure, it might just save you and your team the maintenance of some lock-free code, which is nice if the library-based solution works well enough for you…

1 Like
#5

Ah, so we can make RNG_STATE load “Acquire” and the following RNG_FD load relaxed, correct?

#6

Yes, because the synchronization point is the Acquire load from RNG_STATE and it will synchronize with everything written by the other thread before the corresponding Release store to RNG_STATE.

EDIT: If you still have some issues getting to grips with atomics ordering, you may find this blog post interesting. If you want to learn more about the “reordering barrier” lower-level world view, this blog’s archives are a gold mine.

2 Likes