Unsafe code review for coi's Resolvable

I'm writing my first piece of unsafe code and would really appreciate some other eyes on this if anyone has the time. I've moved it to its own module (https://github.com/Nashenas88/coi/blob/15bfa293d01bfb67e335d31b06827c3059c3c42f/src/resolvable.rs) to limit the scope of unsafe.

What I'm trying to accomplish with that code:
This is for a dependency injection crate, and I'm supporting multiple threads possibly attempting to resolve the same item at the same time. In the past I dealt with this by hiding the resolutions behind a lot of mutexes, but it was hard to manage properly, and prone to deadlocks whenever I would make any sort of involved change.

I switched to using an atomic u8 to manage the states of resolution: unresolved, building, and resolved. The idea I had was that a single thread could guarantee that it's the one doing the building by validating that it is able to update the atomic, and in doing so, I could use UnsafeCell to ensure that no other thread is writing. However, I also needed to make sure that no other thread attempts to read the value in the UnsafeCell during the building time, so there's also a (Mutex<bool>, Condvar) tuple that's used to block all other threads until the building is done, and the value has been resolved. This way I can ensure all of the other threads are notified at once and the CPU isn't blocked with busy waiting.

I wanted to know if I may have gone overboard and don't need unsafe. Have I gotten too much tunnel vision and maybe I'm missing something more obvious?

In resolve_inner at lead the order of updates to is_building and variant isn't what you intended. You first do a SeqCst write to variant to release it as Unresolved and then lock the condition variable to set it to false. This means some other thread could have grabbed the resolution and ran the code that sets *is_building = true and now its update is running while having is_building = false. I don't think this leads to further problems other than quite a lot of cpu activity and lock contention on the building lock though.

I think you do have a concurrency bug though. If we examine the possible paths a thread could take towards unwrapping a built Option then we find that there one where no acquire load of variant takes place, this happens when both loads return RESOLVED immediately. Now, if that is the first time this thread has examined the Resolvable then it has not yet synchronized with the operation that wrote the item, which means its loads are racey. Note for example the impl of once_cell.

1 Like

Thank you for your quick response! I took some time to read the nomicon, especially the parts around ordering. I'll make sure to apply your changes and also add some tests on an arm device to see if I run into any issues. I really appreciate the feedback!

@HeroicKatora I think I solved both issues you brought up. I replace the compare_and_swap orderings from Relaxed to AcqRel. I employed double-checked locking around the switch to building to avoid the second issue you brought up. The updated code is here: https://github.com/Nashenas88/coi/blob/atomic-updates/src/resolvable.rs. I got rid of the condvar and use a RwLock instead, this I believe would allow the waiting threads to all proceed at once.

After writing that, I wondered if I could get away from unsafe by using RwLock and I have that implementation here: https://github.com/Nashenas88/coi/blob/rw-lock-resolvable/src/resolvable.rs

I have never done performance testing with threads before, so I'm not sure of the best way to compare the difference in performance between the two. I wrote this benchmark for that purpose: https://github.com/Nashenas88/coi/blob/atomic-updates/benches/threads.rs (which is the same in both branches). In the atomic-updates branch, the performance seems to be fairly consistent, but in the rw-lock-resolvable branch, I get timings that jump around. The former on my current laptop had timings around 3.4ms, with not too much variantion, while the latter would get down to almost 3.5ms, but sometimes jump up to 6 ms. I closed all other apps running on my laptop, to hopefully reduce noise from the system, but it wasn't clear to me how I could analyze this more easily/clearly.