Upgrading "dead" Weak< UnsafeCell< MaybeUninit<T>>> to Rc<T>

Hi All,

So I've created a crate maybe-rc that covers my usecase on the project but now want to make it as sound as it is possible. The main idea here is to make Rc::new_cyclic alternative to handle errors and async code.

Basic idea is that at the start I'm creating a "dead" Weak reference:

pub fn new() -> Self {
    // allocate Rc (strong = 1, weak = 1)
    let strong = Rc::new(UnsafeCell::new(MaybeUninit::uninit()));
    // create Weak (strong = 1, weak = 2)
    Self { weak: Rc::downgrade(&strong) }
    // drop Rc (strong = 0, weak = 1)
}

And than when I need to "materialize" it:

pub fn materialize(self, value: T) -> Rc<T> {
    let ptr = self.weak.into_raw();

    unsafe {
        let maybe_uninit = (*ptr).get();
        let maybe_uninit = &mut *maybe_uninit;
        maybe_uninit.write(value);
    }

    unsafe {
        Rc::increment_strong_count(ptr);
        Rc::from_raw(ptr.cast())
    }
}

I have two problems here:

  1. is it ok to strip UnsafeCell+MaybeUninit from the pointer? In theory it should be find because both of them are #[repr(transparent)] and I've just initialized the value
  2. "reviving" Rc pointer by calling increment_strong_count while also consuming Weak to ensure one weak count that is help by all strong refs

So far it seams there is no 100% sound way to achieve it but I want to at least make it as safe as possible...

Thanks.

Without considering what the documentation of Rc might promise (or in particular not promise), the basic idea seems fine. Unllike the code here, the implementations for the crate as it is on crates.io feature a questionable transmute between Weak and Rc directly.

Also I’m not sure I see any good way to make the Arc version sound, as the reference counters can only be updated in a Relaxed manner with the public API of Arc, which is likely insufficient for sound initialization. (For example, the internals of Arc::new_cyclic use Release ordering.)

Your call to increment_strong_count will hit this code:

// We insert an `assume` here to hint LLVM at an otherwise
// missed optimization.
// SAFETY: The reference count will never be zero when this is
// called.
unsafe {
    core::intrinsics::assume(strong != 0);
}

let strong = strong.wrapping_add(1);
self.strong_ref().set(strong);

You're violating the assumption in the core::intrinsics::assume call, which causes UB.

5 Likes

I didn't push these changes yet. Still trying to solve above problems.

Nice catch. I forgot about this...

hm... on my toolchain (aarch64-apple-darwin) implementation is very different:

// Retain Rc, but don't touch refcount by wrapping in ManuallyDrop
let rc = unsafe { mem::ManuallyDrop::new(Rc::<T>::from_raw(ptr)) };
// Now increase refcount, but don't drop new refcount either
let _rc_clone: mem::ManuallyDrop<_> = rc.clone();

It seams I can't rely on this... Will need to find another way. Thanks.

What I posted comes from inside the rc.clone call.

weak reference in Rc is not the same as weak reference in GC. in GC, there's the possibility when the object is unreachable through strong references (i.e. became garbage), but has not been swept yet (if there's a finalizer, it might be scheduled or even running but not finished), so you can resurrect a "dead" object from a weak reference simply by storing it into a strong reference.

for Rc, on the other hand, once the strong count reaches zero, the value is dropped eagerly, there's not such period when an object is unreachable strongly but not dead. UnsafeCell doesn't make a difference here. once dropped, it is invalid to access again. the weak pointers only keeps the dead MEMORY from being deallocated. it's like when a local variable (whatever the type is, UnsafeCell, MaybeUninit, it's irrelevant) goes out of scope, the memory of the stack frame might still be there, but you can't reference it from a "dead" pointer.

that's why the document for increment_strong_count contains this safety notice:

Safety

The pointer must have been obtained through Rc::into_raw, and the associated Rc instance must be valid (i.e. the strong count must be at least 1) for the duration of this method.

the code shown by @alice is from RcBox::inc_strong(), which relies on this very safety assumption.

since you use a pointer obtained via Weak::into_raw(), you are violating the safety precondition of Rc::increment_strong_count(), so it will never be sound. UnsafeCell and ManuallyDrop are simply irrelevant in this context.

What about Rc::new_cyclic? Inside of its closure you have an unreachable strongly but not dead reference that becomes "alive" later.

I basically want to achieve the same functionality but split into two methods instead of a single one.

Those two were two separate qustions.

no, inside the cyclic constructor, you don't have an "unreachable but not dead reference", the object itself does not exist yet, the reference is neither dead or alive, it's just a placeholder. if you insist, it is more "dead" than "not dead" since you cannot upgrade the weak reference. also, it later become alive, but not though the weak reference.

"new_cyclic()" is meant to create self referential data structures, not for two stage constructor. for RC, if the value is alive, it is strongly reachable; if it is cannot be reachable from strong pointers, it is dead, the "unreachable strongly but not dead" state is only possible with references that lazily destruct the value such as GC, but with GC you don't need special treatment to create self referential data structure because GC is designed to handle cyclic references.

what's point to split one constructor into two?

anyway, the idea probably can be implemented, but not in the same API as your example. there's a unstable uninitialized constructor feature which might be interesting.

Rc and Weak are tightly coupled with the non-public type RcBox, if you want to allocate memory but not to construct the object, it is inevitable to utilize types like RcBox<MaybeUninit<T>>, or RcBox<Option<T>> (which is more overhead but safer). your original implementation could have worked, if you just use regular Rc<MaybeUninit<T>> instead of Weak<...> (UnsafeCell is unnecessary , you can simply use Rc::get_mut_unchecked() since you have an single owned Rc).

Weak::from_raw() can only be called with pointers returned from Weak::into_raw(), and Rc::from_raw() can only be called with pointers return from Rc::into_raw(); mixing them up is UB, even if at runtime they happen to point to the same memory address.

Just for fun, here’s a safe implementation of MaybeArc that’s super inefficient, and limited to T: Send + Sync + 'static.

use std::sync::{mpsc, Arc, Weak};
use std::thread;

struct MaybeArc<T> {
    thread: thread::JoinHandle<Arc<T>>,
    weak: Weak<T>,
    init_chan: mpsc::SyncSender<T>,
}

impl<T: Send + Sync + 'static> MaybeArc<T> {
    pub fn new() -> Self {
        let (init_chan, init_val) = mpsc::sync_channel(0);
        let (weak_tx, weak_rc) = mpsc::sync_channel(0);
        let thread = thread::spawn(move || {
            Arc::new_cyclic(|weak| {
                weak_tx.send(weak.clone()).unwrap();
                init_val.recv().unwrap()
            })
        });
        let weak = weak_rc.recv().unwrap();
        Self {
            thread,
            weak,
            init_chan,
        }
    }
    pub fn downgrade(&self) -> Weak<T> {
        self.weak.clone()
    }
    pub fn materialize(self, value: T) -> Arc<T> {
        self.init_chan.send(value).unwrap();
        self.thread.join().unwrap()
    }
}
4 Likes

Wow, this is some next level of inefficiency. At least functionality wise it fits what I've tried to achieve :slight_smile:

If you're looking for general two-step initialization, the Rust-for-Linux project has relevant library code.

Not really. You've missed the point of receiving non-upgradable Weak<Parent> references until UniqueRc.write is called. I need Rc<Child> to be already constructed when calling write while child requires Weak<Parent>.

I will give them credit for MaybeUninit. it could probably be added to UniqueRc from Tracking Issue for `UniqueRc`/`UniqueArc` · Issue #112566 · rust-lang/rust · GitHub. That way it will cover my usecase. UniqueRc in this implementation allows creating non-upgradable Weak until it is converted into Rc.

But it requires either fully copying implementation of Rc/Arc from std (like Rust-For-Linux did) or changes in the std itself :frowning:

I do believe that the idea/proposal of adding this kind of API to std is very reasonable. Arc and Rc notably have some convenience features that aren’t reproducible for non-standard clones of these types; additionally the desire for compatibility with APIs that prescribe the concrete type Arc/Rc from std is another concern.

The current new_cyclic has set essentially all the precedent for 2-step initialization, and as I’ve demonstrated above, for Arc such a thing – not restricted to the closure body – is already possible (in principle). Usage in async is a great motivation for eliminating the need for a closure. The exact form of what API std should gain of course could be debated. To empower 3rd party libraries to explore this space further, all that’d be necessary would be a method that can legally increment the strong count from zero to one (i.e. no unsafe assumptions against this, and for Arc the proper Release order for the counter update [1]); alternatively, or additionally, one could also think about what designs std might want to propose itself.

I’d say it’s reasonable to post a question to internals.rust-lang.org what others think of this kind of thing.


  1. an interesting follow-up question, and something that’d definitely need to be properly documented, is whether or not there’d be any unsafe but permitted way, through this API, to re-initialize a Weak after the last Rc has been dropped ↩︎

3 Likes

Created a post there - 2-step initialization for Rc/Arc - libs - Rust Internals.

1 Like

all you need to effectively convert the Relaxed atomic to a Release atomic is to put a Release thread fence after all writes and before the Relaxed atomic.

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.