I wonder, is the following code sound? Or could there be memory-access reorderings which make this unsound? I.e. are MutexGuards synchronizing access outside of the Mutex? And if yes or no, where would this be documented? Would I need memory fences?
Acquire and release semantics affect all memory, if I understand right. But wouldn't (in theory) a mutex just need to synchronize its contents (in case of a mutex that contains a value, which is the case in Rust, but isn't the case in C)?
For example, could a Mutex<u8> (in theory) be implemented with relaxed memory order and yet fulfill the API specifications? For example, aquiring a lock could set an AtomicI16 to -1 while the lock is held and releasing the lock would set it to a value between 0 and 255 (indicating the stored value). Using Ordering::Relaxed could still ensure that only one thread accesses this "mutex" at a time. Or does the terminology "mutex" imply that it also synchronizes accesses outside the mutex? And if so (or not), where is this documented/specified? Or is this common knowledge?
Sorry to ask such (possibly) "dumb" questions, but for me, synchronization is a difficult topic to understand, and I would like to get a deeper understanding. I might solve my broadcast channel problem with a less-efficient yet safe variant, but ultimately, I would like to understand how to implement locks and channels on my own (and I feel like it will require a lot of understanding to not run into soundness issues or other errors).
Implementing your own lock can be done using an atomic. You use compare-and-swap to lock it and a normal write to unlock it.
If the lock operation fails, then retrying again in a loop is called a spinlock. This works but is quite bad for performance. Real locks use some sort of mechanism to park the thread (e.g. std::thread::park) until the lock is unlocked, but this is rather complicated.
I think that the implementation of std::sync::Once is quite interesting if you find this topic interesting.
The problem with just synchronizing the guarded contents of the mutex is that the synchronization point and the guarded contents are disjoint. Even if you were to spinloop a relaxed cmpxchg on the control variable, without an acqrel boundary the disjoint memory the mutex is supposed to be guarding won't be synchronized.
The notion of a "wrapping mutex" that has strictly defined contents at the type level is, as far as I know, a relatively recent invention. Certainly a general purpose C mutex can't wrap a known type because C doesn't have that kind of type genericism. So a mutex exposed via C has to protect all of the memory operations between the lock and unlock because there's no way to define what parts of memory the lock would otherwise be protecting.
What you're describing is closer to a plain atomic value than a mutex, and I would argue an implementation which did attempt to perform that sort of optimization while calling itself a mutex would at the very least be dangerously misleading.
It wouldn’t be unreasonable for hardware or an OS to offer memory barriers scoped to something like a cache line or VM page, which could be used to provide this sort of mutex. I am not aware of any architectures that provide this facility, but neither would I consider a mutex built with them to be misleading or deceptive.
Also, there is no way (given Rust's current semantics) to identify “the contents of” a Mutex in a way that isn't equal to “all memory”: for example, Mutex<Vec<Vec<i32>>> owns a whole bunch of heap allocations, and the mutex is protecting access to them, but the code for Mutex doesn't know about the Vec's heap pointers and the code for Vec doesn't know about the Mutex, so there must be synchronization sufficient to make the locking precede, and unlocking follow, all memory accesses occurring in between the two mutex operations.
Actually, I'm pretty sure that the following design would give a mutex that protects only the contents of the mutex: The mutex is implemented with an atomic pointer pointing to its contents. When the pointer is null, the mutex is locked. Locking the mutex is done using the consume memory ordering instead of acquire. The consume ordering gives guarantees only about memory accessed through a value derived from the pointer returned from the atomic operation.
Now, Rust doesn't actually provide the consume memory ordering, but I believe that this would work in C++. If you're not familiar with it, then you can read more about the consume ordering here.
That was the point of my question: Do I have to provide such synchronization between things inside the mutex and outside of the mutex myself ("manually").
In C, I don't have to do it, because mutexes don't "carry" data. Thus it's clear they synchronize all memory access. But in Rust, this is a different story (in theory), even though in practice Rust will perform such synchronization (yet it might not be clearly documented).
A mutual exclusion primitive useful for protecting shared data
This mutex will block threads waiting for the lock to become available. The mutex can be created via a new constructor. Each mutex has a type parameter which represents the data that it is protecting. The data can only be accessed through the RAII guards returned from lock and try_lock, which guarantees that the data is only ever accessed when the mutex is locked.
It doesn't say that any access to data outside the Mutex is synchronized, does it? It also doesn't say that the shared data doesn't need to be T but could be any data elsewhere in memory.
So if this is a problem, it's an entirely hypothetical one (for now), and I can rely on mutexes synchronizing memory access of the "outside" memory too.
Thanks for pointing me to it. The source reveals that while the problem seems easy, implementation is not trivial. Using a minimum alignment (here) and the least significant bits of a pointer as extra data is an interesting trick to store more than one value in the atomic. I think Ruby also uses the trick of storing extra data in an (aligned) pointer.
Yes, though the Rust std::sync::Mutex is differently from a mutex in C as it specifies the data (type) it protects. So a Rust mutex could operate differently, depending on the promises made in the API (which might be not entirely clear).
Interestingly, lock_api::Mutex uses a RawMutex which doesn't specify the data (type) it protects. So using lock_api, it wouldn't be possible to construct such an (unusual) "mutex".
Well, the question is what does "mutex" imply. To me it's "mutually exclusive access". The wording doesn't say anything about synchronization of other memory. But I understand now that in practice every mutex does synchronize it. (At least it's reasonable to rely on that, though I think maybe documentation formally not exhaustive in that matter yet?)
It doesn't work for all data types, but a Mutex<T> where T: std::any::Any implementation could branch if the T is a u8, I guess, and use an AtomicI16 in that case as a backend. (I know this is rather hypothetical.)
Note that currently (2/2015) no known production compilers track dependency chains: consume operations are lifted to acquire operations.
The specification of release-consume ordering is being revised, and the use of memory_order_consume is temporarily discouraged. (since C++17)
It still might be possible in future (in Rust). New atomic orderings might be added to Rust; std::sync::atomic::Ordering is also marked as non-exhaustive.
I agree doing such things would be very dangerous and could lead to unsound code. So maybe it would reasonable to clarify/specify the "memory fence" properties of a std::sync::Mutex in the documentation, such that these things are ruled out/forbidden? Again, this is a pretty hypothetical problem, I guess.