A few weeks ago, on January 12th, the @rust-lang/libs team discussed a plan to i…mprove `std::sync::Mutex`, `std::sync::Condvar`, and `std::sync::RwLock`.
On most platforms, these structures are currently wrappers around their `pthread` equivalent, such as `pthread_mutex_t`. These types are not movable, however, forcing us to wrap them in a `Box`, resulting in an allocation and indirection for our lock types. This also gets in the way of a `const` constructor for these types, which makes `static` locks more complicated than necessary.
@Amanieu presented his library, `parking_lot`, which implements [the 'parking lot' structure from WebKit](https://webkit.org/blog/6161/locking-in-webkit/). Rather than letting the operating system handle any waiting queues, this manages the queues in user space in a global hash table indexed by the address of the lock. This allows for flexible 1-byte mutexes with almost no restrictions.
There has been [an attempt at merging `parking_lot` into std](https://github.com/rust-lang/rust/pull/56410), as a replacement for the implementation of std's locking primitives. However, many different blockers came up and that PR did not get merged. (See also [my RustConf talk](https://www.youtube.com/watch?v=DnYQKWs_7EA) for some of this history.) Many of these blockers have been resolved since.
One of the problems with replacing std's lock implementations by `parking_lot` is that `parking_lot` allocates memory for its global hash table. A Rust program can define its own custom allocator, and such a custom allocator will likely use the standard library's locks, creating a cyclic dependency problem where you can't allocate memory without locking, but you can't lock without first allocating the hash table.
A possible solution is to use a `static` table with a fixed number of entries, which does not scale with the number of threads. This could work well in many situations, but can become problematic in highly parallel programs and systems. Another possible solution is to skip the global allocator and directly allocate this hash table with a native system API such as `mmap` on Unix. (See [this thread](https://github.com/Amanieu/parking_lot/pull/305) for some discussion.)
In our meeting, we discussed what's natively available on various operating systems/platforms:
- On Windows, we have already switched to Windows SRW locks, which do not require allocation as they can be moved. They are small (32-bit), efficient, and `const` constructible. Just for Windows, there does not seem much advantage to switching to `parking_lot`, although there might be some (small) performance difference.
- On both Linux and some BSDs there's a native `futex()` syscall, which provides functionality similar (but more primitive) than a parking lot. Nearly equivalent functionality [is available in Wasm](https://github.com/WebAssembly/threads/blob/main/proposals/threads/Overview.md#wait-and-notify-operators). While not trivial, it's possible to implement our synchronization primitives directly based on such a futex. This makes all the primitives small (32-bit), efficient, and `const` constructible.
- On macOS and some other Unixes there doesn't seem to be anything that's similar futex syscall. ([`parking_lot` uses `pthread` for its thread parking operations on those platforms](https://github.com/Amanieu/parking_lot/blob/a75875b0bf904287a9749e8eabea919b5e9dd8a9/core/src/thread_parker/unix.rs#L30-L31).)
- [Some smaller platforms](https://github.com/rust-lang/rust/tree/master/library/std/src/sys) that are (partially) supported by the standard library have [their own non-trivial implementation](https://github.com/rust-lang/rust/tree/master/library/std/src/sys/sgx/waitqueue) of the standard locks, which are hard to maintain and have not gotten much validation.
After some discussion, the consensus was to providing the locks as 'thinnest possible wrapper' around the native lock APIs as long as they are still small, efficient, and `const` constructible. This means [SRW locks](https://docs.microsoft.com/en-us/windows/win32/sync/slim-reader-writer--srw--locks) on Windows, and futex-based locks on Linux, some BSDs, and Wasm. And for other platforms without suitable native APIs, a `parking_lot`-based implementation using one of the possible workarounds for the allocation problem.
This means that on platforms like Linux and Windows, the operating system will be responsible for managing the waiting queues of the locks, such that any kernel improvements and features like debugging facilities in this area are directly available for Rust programs.
It also means that porting the standard library to a new platform will be easier, and maintainance of the std implementation for the less common platforms will become easier. These platforms will now only have to implement [a thread parker](https://github.com/Amanieu/parking_lot/blob/a75875b0bf904287a9749e8eabea919b5e9dd8a9/core/src/thread_parker/mod.rs#L4) and can rely on a performant and correct implementation of the standard locks on top of that.
---
Primary goals:
- [ ] Efficient non-allocating locks
- [x] Windows
- [ ] Linux
- [ ] \*BSD
- [ ] macOS
- [ ] others
- [ ] Turn `std::sync::{Mutex, Condvar, RwLock}::new` into `const` functions
- [ ] Finally fix some long-standing bugs
- [ ] https://github.com/rust-lang/rust/issues/85434
Possible later goals:
- [ ] Add a `Condvar::wait_rwlock` to make `Condvar` usable with `RwLock`s?
- [ ] Allow `Send`ing `MutexGuard`s to other threads?
- [ ] Opt-in to [priority inheriting locks on Linux](https://www.kernel.org/doc/Documentation/pi-futex.txt)?
---
To do:
- [x] Relax `Condvar` requirements to allow for unboxed mutexes. https://github.com/rust-lang/rust/pull/76932
- [x] Use unboxed SRW locks on Windows.
- [x] Make Microsoft promise that SRW locks are indeed movable. https://github.com/MicrosoftDocs/sdk-api/pull/447
- [x] https://github.com/rust-lang/rust/pull/76645
- [x] Refactor `sys_common::Mutex` to have a separate `MovableMutex` type to allow unboxing on some platforms. https://github.com/rust-lang/rust/pull/77147
- [x] Remove the `Box`es in `Mutex` and `Condvar` on Windows and Wasm. https://github.com/rust-lang/rust/pull/77380
- [x] Remove the `Box` from `RwLock` on Windows and Wasm. https://github.com/rust-lang/rust/pull/84687
- [x] (Start using `WaitOnAddress` and `NtWaitForKeyedEvent` in the standard library.) https://github.com/rust-lang/rust/pull/77618
- This unblocks usage of these APIs in `parking_lot` if we want to use that on Windows too. But if we just use SRW locks, this was not necessary to unblock the lock improvements.
- [ ] Use futex-based locks on Linux.
- [x] Start using the futex syscall to get some experience with it in the standard library. https://github.com/rust-lang/rust/pull/76919
- [x] Implement the `futex()` syscall in Miri to allow Miri to run standard library tests. https://github.com/rust-lang/miri/pull/1568
- [ ] Implement `Mutex` and `Condvar` using `futex()`.
- [x] Design these.
- [x] Experimentation: https://github.com/m-ou-se/futex-lock-experiment/
- [x] Investigate other implementations (glibc, musl, Windows, etc.)
- [x] musl: https://github.com/rust-lang/rust/issues/93740#issuecomment-1041391284
- [x] glibc
- [x] Mutexes: https://github.com/rust-lang/rust/issues/93740#issuecomment-1041572651
- [x] Condition variables: https://github.com/rust-lang/rust/issues/93740#issuecomment-1048886597
- [x] Reader-writer locks: https://github.com/rust-lang/rust/issues/93740#issuecomment-1055354913
- [x] Boost: https://github.com/rust-lang/rust/issues/93740#issuecomment-1064286563
- [x] Windows' SRW locks: https://github.com/rust-lang/rust/issues/93740#issuecomment-1064139337
- [x] Wine's SRW locks: https://github.com/rust-lang/rust/issues/93740#issuecomment-1055672041
- [x] Apple Darwin libpthread: https://github.com/rust-lang/rust/issues/93740#issuecomment-1069240178
- [x] FreeBSD's libpthread: https://github.com/rust-lang/rust/issues/93740#issuecomment-1069265943
- [x] Make some design decisions: https://github.com/rust-lang/rust/issues/93740#issuecomment-1070696128
- [ ] Implementation: https://github.com/rust-lang/rust/pull/95035
- [ ] Implement `RwLock` using `futex()`.
- [ ] Write some blog post about this, including the rationale for the design decisions.
- [ ] Use futex-based locks on \*BSD.
- [ ] Implement `Mutex` and `Condvar` and `RwLock` using `futex()`.
- Re-use the Linux implementation as much as possible. (Bitset and compare operations are not available, but wait, wake, and requeue are.)
- [ ] Use `atomic.wait`/`atomic.notify` based locks on Wasm.
- [ ] Implement `Mutex` and `Condvar` and `RwLock` using `futex()`.
- Re-use the Linux implementation as much as possible. (Wasm doesn't have a requeue operation.)
- [ ] Use `parking_lot` (or `WordLock`) on other platforms.
- [ ] Work around the `parking_lot` allocation problem.
- [ ] Remove locks from `Instant::now()` so `parking_lot` can safely use it.
- [ ] Integrate `parking_lot_core::thread_parker` and `std::thread::{park, Thread::unpark}`.
- There's duplciation between these two that can potentially be removed.
- It might make sense to store `parking_lot_core`'s `ThreadData` in `std::thread::Inner` instead of a separate thread local.
- [ ] Integrate `parking_lot` into the standard library, and replace the lock implementations of many platforms by it.
- [ ] Make sure this implementation is tested on Linux too in CI.
- [ ] Mark this issue as fixed: https://github.com/rust-lang/rust/issues/85434
- [ ] Make the `new` functions `const`.
- [ ] Celebrate. :tada: