When checking that the flag is set before getting the value reference, the ordering needs to be Acquire in order to synchronize with the Release store and be allowed to read from the UnsafeCell. Without that synchronization edge, you have a data race. The futex might imply sufficient synchronization in the get_or_init case, but I wouldn't rely on that, and the get case absolutely needs an Acquire.
Miri contains simple data race / synchronization detection, so try running your test cases under miri. It's more invasive, but supporting testing under loom also helps build confidence that your orderings are sufficient.
I'm definitely not qualified to audit unsafe code, but here are a few observations (which may be inaccurate or entirely incorrect [1]).
Your Once type leaks any initialized value. This is safe, but not friendly. It should have impl Drop unless the leaking is intentional for some specific reason (which is not documented).
This seems to have the wrong constraint:
By requiring Sync, this allows sending a type that is notSend but isSync to another thread, violating its invariants. One such type is MutexGuard. It has this invariant for POSIX hosts, according to this legendary post: Example of a type that is not `Send`? - #3 by Yandros
Which means it can (in theory) be exploited to send a MutexGuard to another thread to unlock a mutex it didn't originally lock, in violation of the pthreads contract:
fn main()
static M: Mutex<()> = Mutex::new(());
let o = Once::new();
o.get_or_init(|| M.lock().unwrap());
thread::spawn(move || drop(o)).join().unwrap();
M.lock().unwrap();
}
We create a static mutex and then wrap the guard in a Once and drop it in another thread.
This is currently sound as-written -- because Once does not impl Drop! It leaks the guard and the mutex is never unlocked. The last line deadlocks.
Changing the constraint to impl<T: Send> Send[2] will allow a sound Drop impl.
Initially, I required only T: Sync for Send for Once<T>, because I thought that the value won't be moved (hence there is no way to get a mutable access, or drop it (I was only thinking of static variables :'/)). But your example proves me wrong, thanks.
Edit: Hmm, now I'm not sure that the bound is wrong. See, as there is no Drop implementation, nothing happens (so that's, the value is never accessed (mutably and immutably) in a different thread. However, if we add the Drop implementation, then that bound becomes unsound.