let x = std::sync::atomic::AtomicI16::new(1);
// 1 = free
// 0 = locked
let mut b = 2;
loop {
let t = x.compare_exchange(1, 0, Ordering::Relaxed, Ordering::Relaxed);
if t.is_ok() {
break;}}
b = 3;
x.compare_exchange(0, 1, Ordering::Relaxed, Ordering::Relaxed);
Ignoring the fact that (1) the above is single threaded and (2) Rust's borrow checker ensures only thread can write to b: Is my above implementation of a spin lock wrong ?
The reasoning being: Relaxed does not guarantee anything, besides that then given op is atomic. Therefore, the compiler is free to reorder the above into:
b = 3;
loop {
let t = x.compare_exchange(1, 0, Ordering::Relaxed, Ordering::Relaxed);
if t.is_ok() {
break;}}
x.compare_exchange(0, 1, Ordering::Relaxed, Ordering::Relaxed);
and that what I really need to do is to use Ordering:: constraints that forces the b=3 to be sandwiched between the two ?
In general it's definitely "wrong" (won't behave like you want) according to the language spec and on any platform with a weak memory model, yes, but I assume you knew that much!
Whether rustc currently or in the future could generate logically incorrect code on Intel depends on if:
Whether LLVM can "see" the Ordering value while optimizing, and thus knows it can "optimize" according to the language specification - a good question!
The rest of the code in the program being built, which can change the decisions the optimizer can take, which makes it tricky to produce examples demonstrating the issue.
In addition, it is legal to reorder non-atomic and Unordered loads around Monotonic loads. CSE/DSE and a few other optimizations are allowed, but Monotonic operations are unlikely to be used in ways which would make those optimizations useful.
Interestingly, Rust currently generates lock cmpxchg with Relaxed ordering for compare_exchange, but only movzx for load: Compiler Explorer
Well the first good thing is the Rust borrow system make it pretty easy to very aggressively and safely write threaded code without any chance of racing, and without sacrificing performance in anything but very weird situations: knowing how to write, say, a channel let's you appreciate the edge cases that crates like crossbeam deal with, but you're not likely to beat them unless you have very particular knowledge.
The second is that if you do use atomic, you generally only have two cases:
Dumb counters, the only thing you care about is that none get dropped, and something else is handling ordering: you can use Relaxed
You're doing something "mutex like", where you are making a bunch of writes on one thread visible to another only when they're all done. Use stores with Release (like the end of a mutex) on the writing thread to make them visible, and load with Aquire (like the start of a mutex) on the other to see if you can read.
When in doubt and you just want to get it working, SeqCst as the strongest.
The third is that Rust had a tool called MIRI you can use to test multithreaded code. I've not used it myself yet, but it looks very cool.