I am (for my own practice) attempting to write an async mutex with queuing, with the queue being structured as an intrusive singly-linked list where each task allocates the space for its entry in the queue.
My first attempt looked something like this:
struct QueuingMutex {
queue_head: AtomicPtr<QueueEntry>,
// and other fields, including the value locked by the mutex
}
struct QueueEntry {
next_entry: Option<NonNull<QueueEntry>>,
// and other fields, including a waker
}
struct AcquireLockTask<'a> {
queue_entry: UnsafeCell<MaybeUninit<QueueEntry>>,
// and other fields, including an `&'a QueuingMutex` at the mutex we're queued for
}
impl Future for AcquireLockTask { .. }
The (main) problem with this implementation was that polling the future requires making a Pin<&mut AcquireLockTask>
, which causes the queue_entry
field to invalidate any NonNull
s pointing inside, leading to UB when anyone else tries to walk through the list (e.g. when the task that currently has the lock releases it and tries to wake the next task in the queue). As best I can tell from reading Miri's output, this invalidation happens immediately upon construction of the &mut AcquireLockTask
and can't be avoided (even if I never access queue_entry
through the &mut
).
I have found that if I change the code to look like this:
struct AcquireLockTask {
queue_entry: Box<UnsafeCell<MaybeUninit<QueueEntry>>>,
// ^ note the new `Box` wrapper here
// and the same other fields as before
}
then my code immediately becomes sound (at least, Miri calls it sound), as an &mut Box<UnsafeCell<..>>
doesn't invalidate existing pointers to the inner field. However, I'm now paying for an extra heap allocation every time a task has to wait for the lock, and I'd like to avoid heap allocations in any case, if possible.
Is there a way I can get the ownership semantics of the Box<UnsafeCell<..>>
(where an &mut
to the struct doesn't invalidate pointers to what's inside a UnsafeCell
struct field that isn't accessed through the &mut
) without also having to perform a heap allocation? Or is my current approach doomed to always be UB if I don't heap-allocate?