sync::Weak upgrades to None even if value hasn't been dropped yet

Hi,

I just fell over this one while reworking a Rust abstraction for a C API. I guess code says more than a thousand words :slight_smile:

use std::thread;

mod binding {
    use std::sync::atomic::{AtomicBool, Ordering};
    use std::sync::{Arc, Mutex, Weak};

    pub struct CLibrary;

    static C_LIBRARY_HANDLE: Mutex<Weak<CLibrary>> = Mutex::new(Weak::new());
    static C_LIBRARY_INIT: AtomicBool = AtomicBool::new(false);

    impl CLibrary {
        fn new() -> Self {
            assert!(
                !C_LIBRARY_INIT.load(Ordering::SeqCst),
                "Library shouldn't be initialized"
            );
            C_LIBRARY_INIT.store(true, Ordering::SeqCst);
            Self
        }
        pub fn instance() -> Arc<Self> {
            let mut guard = C_LIBRARY_HANDLE.lock().unwrap();

            match guard.upgrade() {
                Some(instance) => instance,
                None => {
                    let instance = Arc::new(Self::new());
                    *guard = Arc::downgrade(&instance);
                    instance
                }
            }
        }
    }

    impl Drop for CLibrary {
        fn drop(&mut self) {
            C_LIBRARY_INIT.store(false, Ordering::SeqCst);
        }
    }
}

fn main() {
    let ts: Vec<_> = (0..100)
        .map(|_| {
            thread::spawn(|| {
                let _ = binding::CLibrary::instance();
            })
        })
        .collect();

    for t in ts {
        t.join().unwrap();
    }
}

To my surprise this will panic occasionally (can take some time, I ran it in a while shell loop until the assertion kicked in).

I assumed that Weak::upgrade will only yield None when the value wrapped by the Arc has been dropped. At least that's what I'm getting from the Documentation

Attempts to upgrade the Weak pointer to an Arc, delaying dropping of the inner value if successful.

Returns None if the inner value has since been dropped.

In the real code I'm not using the AtomicBool but rather call into C initialization functions and deinitialization functions in the new resp. drop methods. The initialization function yells at me when It is already initialized.

I hope I'm not getting tricked by some Memory Ordering thingy within the Arc and my AtomicBool, but why would this then fail with the C function calls (I hope they are thread safe :clown_face: ).

Any help or hints would highly be appreciated :slight_smile:

this line immediately drops the Arc.

when the system is busy, there's a chance where some of the threads all finished already (which means all strong references are dropped), while the rest of the threads have not even started (or even been created or scheduled) yet. once the "late" threads start to run, they will see the Weak already gone, so the initialization function will be called again.

if the C library only provides an API to initialize, but doesn't provide the corresponding API to de-initialize, then you should use LazyLock instead of Mutex<Weak<T>>.


EDIT:

read your post again, the C library DOES have a deinitialization function. however, the cause of the panic is the same underlying race condition:

  • the mutex only protects the initialization of the weak reference, but doesn't protect the deinitialization when strong references reaches zero.

in the scenario I described above, when certain threads finish early, the strong Arc gets dropped, the deinitialization is called (may have not returned yet), but in the meanwhile, some "late" threads just get started to run and because the strong count already reduced to zero, Weak::update() will give None, so the initialization routine is called again.

1 Like

I totally expect that all strong references get dropped and Weak::upgrade yields None. But as far as I understood the documentation (as quoted in the initial post) I'm somewhat guaranteed that drop of the underlying value has been finished when Weak::upgrade yields None.

the deinitialization is called (may have not returned yet), but in the meanwhile, some "late" threads just get started to run and because the strong count already reduced to zero, Weak::update() will give None, so the initialization routine is called again

Is this behavior documented somewhere?

If this is really the case then I think the documentation should be adjusted slightly for Weak

Note that whether or not it should be documented, this behavior is necessary, because updating the reference count is an atomic operation and destructors cannot be in general, so there has to be some point at which upgrade() stops working but the destructor of the contents hasn’t finished.

1 Like

Well, I just had a look at the source of Arc and Weak and indeed drop of the underlying data might not have been finished or even called when Weak::upgrade reports None.

In particular the first thing Arc does in drop is decrement the strong counter
sync.rs - source, Weak might sneak in between the following cleanup Code of drop by checking the strong counter and report None sync.rs - source.

I should've given it a second thought from the beginning I guess. Now while looking at the code and knowing that Arc uses Atomics for synchronization, I guess this behaviour should be obvious, sorry for that.

Now I just need a better way to deal with the problem :smiley: Any suggestions?

Well, the problem here comes from having three separately synchronized pieces of state: the Arc, C_LIBRARY_HANDLE, and C_LIBRARY_INIT. It would be easier to be correct if there were fewer. So, how about getting rid of C_LIBRARY_INIT and letting C_LIBRARY_HANDLE represent that state?

mod binding {
    use std::sync::{Arc, Mutex, Weak};

    pub struct CLibrary;

    /// Invariant:
    /// * If the library is not initialized, the `Option` is `None`.
    /// * If the library is initialized, the `Option` is `Some`.
    /// * If the state is in the progress of changing, the mutex is locked.
    static C_LIBRARY_HANDLE: Mutex<Option<Weak<CLibrary>>> = Mutex::new(None);

    impl CLibrary {
        fn new_unchecked() -> Self {
            Self
        }
        pub fn instance() -> Arc<Self> {
            loop {
                // At this point we block if a previous de-initialization is happening --
                // but we *might* sneak in before it takes the lock.
                let mut guard = C_LIBRARY_HANDLE.lock().unwrap();

                match guard.as_ref().map(|weak| weak.upgrade()) {
                    // All’s well — return previous instance.
                    Some(Some(instance)) => return instance,

                    // Oops, we acquired the lock but the destructor is going to want it.
                    // Try to get in line again behind the destructor.
                    Some(None) => {
                        drop(guard);
                        std::thread::yield_now();
                    }

                    // Not initialized. Time to initialize.
                    None => {
                        let instance = Arc::new(Self::new_unchecked());
                        *guard = Some(Arc::downgrade(&instance));
                        return instance;
                    }
                }
            }
        }
    }

    impl Drop for CLibrary {
        fn drop(&mut self) {
            let mut guard = C_LIBRARY_HANDLE.lock().unwrap();

            // now call the C deinit function
            std::thread::sleep(std::time::Duration::from_micros(100));

            *guard = None; // timing of this doesn't actually matter, but it does matter the lock is held
        }
    }
}

This isn’t perfect since it contains a sort of spin-loop. The optimal version would, I imagine, use some sort of customized reference-counter in place of Arc.

1 Like

Oh very cool, thank you :smiley:

I'll definitely try that. I'm also considering something like that

mod binding {
    use std::sync::Mutex;

    pub struct CLibrary;

    unsafe fn c_init_mock() {
        println!("Init");
    }
    unsafe fn c_deinit_mock() {
        println!("Deinit");
    }

    static C_LIBRARY_INIT: Mutex<bool> = Mutex::new(false);

    impl CLibrary {
        pub fn new() -> Result<Self, ()> {
            let mut is_init = C_LIBRARY_INIT.lock().unwrap();
            if *is_init {
                Err(())
            } else {
                unsafe { c_init_mock() };
                *is_init = true;
                Ok(Self)
            }
        }
    }

    impl Drop for CLibrary {
        fn drop(&mut self) {
            let mut is_init = C_LIBRARY_INIT.lock().unwrap();
            assert!(*is_init);

            unsafe { c_deinit_mock() };
            *is_init = false;
        }
    }
}

Then just wrap it into an Arc and send it to the worker threads. I think this would more closely match the actual real world use case of the library.