The point is that compiler will eliminate load operation for performance, I'm curious about if there is any document about this in rustc, so that we can use read_volatile properly.
First of all, the point is actually that the C compiler won't eliminate the load operation, and you don't need to use volatile.
I'm not sure what load_volatile is, but Rust has std::ptr::read_volatile, and its guarantees are explained on its documentation. It is unrelated to mutexes or other thread synchronization concepts.
I can understand that volatile in rust is intended to act on I/O memory. But if we have codes below
unsafe fn example(mutex: Arc<Mutex<()>>, p: *const i32) {
let x = *p;
let guard = mutex.lock().unwarp();
let y = ptr::read(p); // should we use ptr::read_volatile() here?
}
should we use ptr::read_volatile() when assigning to y in case of another thread write to p? (I know the code style is bad, it's used just for discussing volatile)
or another example
unsafe fn example(mutex: Arc<Mutex<()>>, p: *const i32) {
let guard = mutex.lock().unwarp();
let x = *p;
drop(guard);
for _ in 0..10 {
let guard = mutex.lock().unwarp();
let y = ptr::read(p); // should we use ptr::read_volatile() here?
}
}
Just like in C, whether an operation is volatile has no bearing whatsoever on questions involving concurrent access from multiple threads. Volatile accesses behave exactly like non-atomic accesses in that regard. In particular, a race between a read_volatile and any write operation to the same location is undefined behavior.
The first one is potentially UB because the first read is potentially a data race. The second one is fine, assuming the mutex and pointer are used together elsewhere. Using read_volatile doesn't change that, which the documentation is very clear about, but it may change what the UB results in.
thanks for your patience, but still some doubts here.
The documentation emphasizes that volatile read/write IS NOT ATOMIC, what I concern is that in fn example(), we just read pointer p, will compiler read p from memory only once then we may GET THE STALE VALUE?
The documentation seems to say that there is no need to use volatile in concurrency programming. Then can we conclude that rustc will always read p from memory? If the answer is yes, how rustc decides not to eliminate redundant read operations since there is no write operation exists.
My understanding is the following: The lock operation acts as an acquire atomic operation and the unlock operation as release atomic operation. An acquire and release to the same location synchronize, causing all memory changes before the release by the thread performing the release to be observable after the acquire by the thread performing the acquire. This means that the compiler isn't allowed to move the load before the acquire (as would be necessary to deduplicate it).
yes,I think the same way, such load optimization must be affected by memory fence or atomic operations in some way. But I did not find any formal specification about how it is affected.
use std::sync::atomic::{compiler_fence, fence, Ordering};
pub fn example(x: &[i32]) {
let a = x[0];
// fence(Ordering::AcqRel);
let b = x[0]; // without fence/compiler_fence, load b will not access memory
println!("{a} {b}");
}
If we insert a fence/compiler_fence function between a and b. a and b will aceess memory separately(in nightly rust with opt-level=3), otherwise there will be only one memory load.
But in stable rust, there will be only one memory load no mattter if we have inserted fence or not.
I tries to search some documents about this but found nothing I even don't know the terminology of this topic, I'm very grateful that you understand what I'm asking
Passing a shared reference without interior mutability as argument to a function asserts that the pointee doesn't change for the entire duration of the function. (the stacked borrows and tree borrows memory model proposals call this a "protector") As such removing duplicate loads is indeed fine. If you passed a raw pointer or used UnsafeCell<i32> as slice elements, rustc will have to reload it when the fence is present I think.
you are right, I tried this, stable rustc do reload the value when fence is present
pub unsafe fn example(x: &[UnsafeCell<i32>]) {
let a = *(x[0].get());
compiler_fence(AcqRel);
let b = *(x[0].get()); // access memory again only if fence is present
println!("{a} {b}");
}
now the point is, is there any formal documents about how rustc decides to deduplicate the load/store operation? Does rust promise never return a stale value in concurrent programs if we don't use read_volatile?
Neither Rust nor rustc guarantee what optimizations will be made. They aren't formalized and they won't be formalized. If you want to know if an optimization happens, the easy way is to compile the code in question, and the hard way is to read the source of rustc and LLVM.
The way Rust is currently designed, if you have observable differences (besides performance) in the sets of possible outcomes of read and read_volatile on non-volatile memory, it will always be undefined behavior, usually because of a data race. And when there is undefined behavior, neither Rust nor rustc make any guarantees whatsoever. Swapping read and read_volatile doesn't change that.
Atomic orderings are the only control Rust has for multithreaded data access. They are very abstract and don't correspond to a certain number of reads or a certain performance characteristic. If you want guarantees about behavior, then you must structure your program so that atomic orderings describe those guarantees. The language doesn't have a way to get guarantees about performance. There are short and long explanations of atomic orderings in Rust.