Translating a read-write lock from C to Rust

cmnord · July 24, 2020, 5:50am

Hi all, I am implementing the read-write lock from this paper in Rust. Listing 1 on page 5 of the PDF has a clear pseudocode implementation I've tried to follow. I also have a reference implementation in C from a related paper.

My implementation: https://github.com/cmnord/pflock/blob/master/src/lib.rs
The C reference implementation: https://github.com/cmnord/pflock/blob/master/pflock_c/pft.h

Each implementation has fewer than 100 lines of code, but I've introduced a bug in translation.

The bug is reproduced in tests/simple.rs (run with cargo test simple). When readers and writers access the same shared object from different threads, sometimes the writer acquires the write lock before a reader has released a read lock. I use a RefCell as the lock-guarded object in my test to catch this bug at runtime.

Would anyone more familiar with C and Rust (and the translation between) be willing to take a look at these two implementations and see where I've messed up? They should basically correspond line-by-line.

kornel · July 24, 2020, 6:01am

Maybe testing with loom would help?

cmnord · July 24, 2020, 6:05am

Oh wow, this is really interesting! I'll give it a try tomorrow. I was planning to try to enumerate all interleavings manually on paper, haha.

In the meantime, I would still really appreciate anyone's thoughts just from skimming over the code. It's basically translations from C's __sync_fetch_and_{} to Rust's AtomicUsize.

geebee22 · July 24, 2020, 6:24am

I may be totally wrong here, I only looked for a few seconds, but in your test, as I understand it, you have a shared variable accessed by obj. I think the problem may be that you are taking a lock, but it's not the "right" lock. That's to say you are not locking the shared variable, you are locking something else.

This means that the shared variable may not be written to shared memory (rather than some cache) before the next thread reads it.

[ Disclaimer: I am pretty much struggling with understanding concurrent programming in Rust, so take the above with a BIG pinch of salt! ]

cmnord · July 24, 2020, 6:29am

It's true that unlike Rust's actual Mutex<T>, etc. my lock is not parametrized with T and doesn't take ownership of the locked object, return a MutexGuard<T>, etc. Could that actually lead to the outcome you described (guarded variable is not written back)? I want it to work before I add these nice Rust-y features.

geebee22 · July 24, 2020, 6:51am

Looking a little bit closer, in your test program ( https://github.com/cmnord/pflock/blob/master/tests/simple.rs) I see:

unsafe impl Send for DummyStruct {}
unsafe impl Sync for DummyStruct {}

My guess is that these "unsafe" keywords are what has let the problematic behaviour in...

But as above, I don't really know what I am talking about, hopefully a real expert can speak with more authority.

As background on what might be going on, that each processor "core" has it's own L1 cache, this link might help: Time difference between L1 cache fetch and memory fetch | Notes

cmnord · July 24, 2020, 11:13am

These unsafe keywords are intentional -- the unsafe "marker trait" Sync asserts that the developer has made the type safe to share among threads. However, I intentionally did not implement thread safety in the type itself. Instead, I want to show that protecting it with the lock preserves Rust's ownership rules by only allowing one mutable reference or many immutable references at any time. RefCell panics at runtime if Rust's ownership rules are violated at runtime, which is why my test panics. When the lock works the test should no longer panic.

2ndTaleStudio · July 24, 2020, 4:41pm

Hi there,

I've not read the paper you are referring to to be honest but your code looks a bit over complicated to me - for the use case of a RWLock.

I guess the way you are using the atomics is at a certain point breaking your expectations. For example the call to self.win.fetch_add(1, Ordering::SeqCst) is always adding 1 to the atomic and returning the previous value. So this will sooner or later interfer with your bit mask of your reader count.

Looking into your test you spawn 3 threads each running 100 tries - this would sum up to potential 299 times of write requests that ought to be not successfull if the first thread get's the lock. This is 0x12c and clashes your assumption of using a 0x100, 0x0FF respectively as mask for the read lock.

My proposal would be (and this is how I've implemented a RWLock) to use a simple atomic bool for the write lock and an atomic usize for the read lock. This might make your implementation more robust as there is no need to do any bit masking or the like.

The write lock can only be acquired if the read lock count is 0 and the write lock bool is false.
The read lock can only be given if the write lock bool is false no matter the actual read lock count.

drewkett · July 24, 2020, 5:31pm

Are you sure the reference implementation can survive the test? You might try using ffi to wrap the c code in a different module and run the same test on it.

cmnord · July 24, 2020, 7:15pm

Yeah, there are definitely simpler RWLocks! I’d like to use the one from the paper because it’s a phase-fair read-write lock, which alternates between a reader phase and a writer phase. This is good for real-time applications.

Page 4 from the paper under the “ Phase-fair RW locks.” title explains this more if you’re interested.

cmnord · July 24, 2020, 7:18pm

Yeah, that might be good as a correctness check — I think the C implementation should work because it’s used in these papers, but maybe I’m misunderstanding how it’s supposed to work...

cmnord · July 25, 2020, 4:50pm

I'm not sure if I understand what you mean. Suppose the following permutation:

Clean slate.
rin=0 rout=0 win=0 wout=0

A write request arrives.

wticket := 0, win := 1
wout = wticket = 0, so first spin passes
w := 0x2 | (0 & 0x1) = 0x2
rticket := 0, rin := 0x2
rout = rticket = 0, so second spin passes

And so the write lock is acquired.
rin=0x2 rout=0 win=1 wout=0

i = 1..299 more write requests arrive.
```
wticket := i, win := i + 1
spin until wout == i
```
All the write requests spin.
rin=0x2 rout=0 win=300 wout=0
The first write request releases the lock.
```
rin := 0x0
wout := 1
```
rin=0x0 rout=0 win=300 wout=1
Now the next write request can progress.

This RWLock is a ticket lock, so a write request won't get to add 0x2 or 0x3 to rin until its turn in line. Once a write request passes the wticket == wout spin it is the only write request active (though readers could still acquire the lock before the second rticket spin). write_lock only updates rin after the first spin, so I don't think rin's lower two bits can exceed 0x3.

Could you explain the sequence of execution you had in mind? I think that the lower two bits of rin always get zeroed before another writer attempts to increment them, preventing the overflow you described.

Thank you for reading over the code! It's been really helpful for me to think about this and consider possible interleavings.

EDIT: this is also addressed in the paper at the end of page 5: "Note that the least-significant byte of rin equals zero when observed in line 23 since no other writer can be present."

cmnord · July 25, 2020, 9:01pm

I just started an implementation in this branch. However, bindgen has generated bindings for read_lock, etc. that take &mut self -- no interior mutability. If the lock doesn't provide interior mutability, then it can't be wrapped in an Arc (Arc doesn't implement DerefMut) and shared among threads, right? Do you know how I could work around this?

2ndTaleStudio · July 25, 2020, 9:14pm

Hi,
well may be I've overlooked something. However, I'm not able to reproduce your issue on my machine with cargo test simple - the test always succeeds...

drewkett · July 26, 2020, 1:48am

I think it may be okay in this case to just take a shared reference and then transmute it to *mut in the function. Since the c code is providing the synchronization, I think its okay to just let the rust code pass shared references around. I'm sure there is a better/more clear way to structure it if the ffi code was getting turned into a library, but I think you should be fine for testing.

I did try what i said and the C code also fails frequently, so it seems possible there is something wrong with the algorithm/implementation

cmnord · July 27, 2020, 8:59pm

Oh cool, I should have realized there was an unsafe way to do this. I did it like this based on this StackOverflow; did you do something similar?

pub fn read_lock(&self) {
    unsafe {
        let const_ptr = self as *const PFLock_C;
        let mut_ptr = const_ptr as *mut PFLock_C;
        pft_read_lock(&mut (*mut_ptr).0);
    }
}

Oh darn, I am seeing this too... I wonder if rather than being a lock correctness issue, it instead has to do with how RefCell references are dropped? At first I thought it could be that the Ref/RefMut from the RefCell wasn't being dropped before unlock is called. I added that nested scope (code block) in the test to prevent that, though...

leudz · July 27, 2020, 9:39pm

Be careful, your last cast is Undefined Behavior as said in SO.

The C code provides the synchronization, so the Rust code should be okay giving
it mutable references.

Just creating the exclusive reference makes your code UB but even when you cross to C you have to keep the promises you made with rustc.
I'm not sure if it's legal to mutate an atomic directly with C atomic operations since you don't have access to the UnsafeCell. Someone should confirm/refute this. There will be soon a AtomicUsize::as_mut_ptr so I might be right.

cmnord · July 27, 2020, 10:13pm

Hi all, I think the issue with my test using a RefCell is that RefCell itself is not atomic -- consider the implementation of Drop (source):

impl<'b> Drop for BorrowRef<'b> {
    #[inline]
    fn drop(&mut self) {
        let borrow = self._borrow.get();
        debug_assert!(borrow != WRITING && borrow != UNUSED);
        self._borrow.set(borrow - 1);
    }
}

A read-write lock allows multiple readers at the same time, which may race. If two readers raced to drop the same RefCell, their gets and sets could interleave, resulting in an incorrect count. For example, suppose there are 2 readers, R1 and R2. They both call drop at the same time. Before both references drop, the borrow count is 2.

R1: let borrow = self._borrow.get(); // 2
R2: let borrow = self._borrow.get(); // 2
R1: self._borrow.set(borrow - 1);    // 1
R2: self._borrow.set(borrow - 1);    // 1

Both references have been dropped, but the reference count thinks that one reference is still valid.

So the issue is not with my lock implementation (I think), but with my test! I'll use an UnsafeCell as the backing data structure instead, then, because I don't need RefCell's reference counting anyway because my lock handles it.

Hyeonu · July 27, 2020, 10:19pm

To make RwLock<T> from RwLock type, try impl lock_api::RawRwLock for your type and use lock_api::RwLock with it.

cmnord · July 27, 2020, 10:44pm

I'm just using the C implementation as a ~perfect reference~ to check that my Rust implementation doesn't have any new bugs that the C implementation doesn't have, so as long as this cast doesn't introduce new bugs I'm not too concerned about it.

Oh awesome, thanks for sharing this!

Topic		Replies	Views
Concurrent read writer help	7	997	September 25, 2023
Mutex but non-blocking on reads? help	11	2187	September 15, 2023
Multicore-friendly struct layout vs locking	4	179	August 2, 2024
Data share between one writer thread and many reader thread help	24	837	July 1, 2024
[SOLVED] Help with shared data and mutexes help	5	4797	January 12, 2023

Translating a read-write lock from C to Rust

Related topics