A Guide to Global Data in Rust

:wave: Hey everyone, I'm currently working on a guide to global data in Rust, including let, const, include_str, include_bytes, lazy_static, phf, include, and static mut. I would love to hear your feedback on how the guide can be improved and augmented!

5 Likes

I would consider adding thread_local! and once_cell to the guide. Personally, I find them very useful and they seem to be popular options for others as well.

6 Likes

I also find arc-swap useful. It allows me to load a new version of the object without locking.

3 Likes

I could see dependency injection in your list too.

There is a pretty good answer to singletons on StackOverflow which might be helpful as well.

1 Like

Thanks for the suggestion! I added a once_cell example. How would I use thread_local! for global data? By making thread-local copies of an immutable configuration, maybe?

@kornel thanks for the suggestion! I just added arc-swap.

@ronlobo thanks for pointing me to that Stack Overflow thread!

I think DI is a great idea, but I'm a little hesitant to add it in here as I feel like DI is a deep topic in and of itself. Maybe I could take a swing at writing a DI guide, if one doesn't already exist.

How would I use thread_local! for global data? By making thread-local copies of an immutable configuration, maybe?

It might be outside the scope of the article since the data will be scoped to the current thread and not strictly global, but I feel thread_local! is like The Fourth Beatle that goes by unnoticed even when it should be considered.

  • Some programs are simply single threaded. You can have mutable global data and avoid the cost of synchronization since you don't need it.

  • Some multi threaded programs doesn't need a shared mutable state between threads, but still want the advantages of thread local data which can be great for certain API's or just for the initialization guarantees (only one instance can exist in a thread). I've seen globals which requires synchronization used in these cases just because they didn't know about thread_local.

  • Types doesn't need to be Sync

  • You can use a RefCell for interior mutability

  • Values get's dropped when the thread exists (with some caveats)

And probably more.

Granted, once_cell can cover these cases as well, but since it's in the standard library I thought it might be worth a mention.

1 Like

Yeah, I think I want to keep it out of the scope of the article but I would like to write something about it because I know that thread_local! exists but I have never really known what to use it for.

1 Like

:warning: Careful, static mut is a construct that still exists for backwards compatibility, it is expected to be eventually deprecated, given how easy it is to have unsound code and trigger UB when using them :warning:

So I highly advise against suggesting its usage, which can always be replaced with plain static VAR, using safe shared-mutable wrappers, such as Mutex/RwLock (in conjunction with lazy_static! or once_cell, or some const fn constructor such as ::parking_lot::const_mutex) in a multi-threaded context, and Cell/RefCell (in conjunction with thread_local!) otherwise.

  • Regarding FFI, instead of *mut Thing, you use a *const Mutex<Thing>, for instance, and (*ptr).lock() it on usage, for the general case, although in practice when Thing is just an integer you could simply cast your *const Cell<integer> to a *mut integer and operate with the latter, provided you never upgrade that pointer to &'_ integer nor &'_ mut integer (but staying with a *const Cell<integer> that you upgrade to &'_ Cell<integer> to use .set() and .get() is less error-prone).
7 Likes

Hey @Yandros, thanks for this! You're right that including mutable static items in the guide was a mistake. I removed the examples and replaced them with a dire warning. I also added a section on immutable statics with a parking_lot example to give people an idea of how they can implement mutable global data more safely. I feel bad about including mutable statics because I know that some programmers will think of them as an easy way to "just get things done" without necessarily understanding all the requirements to use them safely. :pensive:

5 Likes

Great! I love how it is phrased in the repo right now :ok_hand:

1 Like

Hey, that's a great idea for a guide! Here's my addition.

I have a pretty specific use-case: a library that is LD_PRELOAD-ed into a game, hooking some functions. Since I know that pretty much all of the functions I care about are going to be called from a single main game thread, I came up with this global variable scheme.

  1. There's a MainThreadMarker struct, non-Send/Sync. If you have one of these you're on the main game thread.
    /// This marker serves as a static guarantee of being on the main game thread. Functions that
    /// should only be called from the main game thread should accept an argument of this type.
    #[derive(Clone, Copy)]
    pub struct MainThreadMarker {
        // Mark as !Send and !Sync.
        _marker: PhantomData<*const ()>,
    }
    
    impl MainThreadMarker {
        /// Creates a new `MainThreadMarker`.
        ///
        /// # Safety
        /// This should only be called from the main game thread.
        #[inline]
        pub unsafe fn new() -> Self {
            Self {
                _marker: PhantomData,
            }
        }
    }
    
    Hooked functions construct the marker and pass it down:
    #[no_mangle]
    pub unsafe extern "C" fn Host_Shutdown() {
        abort_on_panic(move || {
            let marker = MainThreadMarker::new();
    
            some_rust_function(marker);
        });
    }
    
  2. All global state is stored in MainThreadCell or MainThreadRefCells which provide safe access if you have a marker:
    /// Cell accessible only from the main thread.
    pub struct MainThreadCell<T>(Cell<T>);
    
    // Safety: all methods are guarded with MainThreadMarker.
    unsafe impl<T> Send for MainThreadCell<T> {}
    unsafe impl<T> Sync for MainThreadCell<T> {}
    
    impl<T> MainThreadCell<T> {
        /// Creates a new `MainThreadCell` containing the given value.
        pub const fn new(value: T) -> Self {
            Self(Cell::new(value))
        }
    
        /// Sets the contained value.
        pub fn set(&self, _marker: MainThreadMarker, val: T) {
            self.0.set(val);
        }
    }
    
    impl<T: Copy> MainThreadCell<T> {
        /// Returns a copy of the contained value.
        pub fn get(&self, _marker: MainThreadMarker) -> T {
            self.0.get()
        }
    }
    
    static GLOBAL_MUTABLE_VALUE: MainThreadCell<i32> = MainThreadCell::new(16);
    
    fn some_rust_function(marker: MainThreadMarker) {
        let value = GLOBAL_MUTABLE_VALUE.get(marker);
        GLOBAL_MUTABLE_VALUE.set(marker, 32);
    }
    

This way there's no runtime overhead (present with thread_local or mutexes) and it still ensures the global data doesn't have multiple exclusive references (with the RefCell variant).

1 Like

Hey @YaLTeR, that's an interesting technique. Thanks for sharing! Is there a way to encapsulate the unsafe functionality in a crate so that others can make use of the technique without needing to write any unsafe? Right now the guide is only safe code, and I think it might be better to keep it that way.

There might be a way to remove unsafe from MainThreadMarker::new() by storing the thread ID on the first call and then verifying it's the same on subsequent calls but I haven't explored this yet.