Static vec vs static hashmap. Whats the difference?

I was playing around with handling things in rust and I have run into a situation I can't explain. In the code below the static vector works fine, but the static hashmap will not compile. Why is there a difference in how they are handled and is there a way to make the hashmap work?

use std::collections::HashMap;

fn foo(i: u64) {
    static mut x: Vec<u64> = Vec::new(); // works okay
    unsafe {x.push(i);}
    println!("{}", unsafe{x.len()});
}

fn bar(i: u64) {
    static mut x: HashMap<u64, u64> = HashMap::default(); // not allowed
    unsafe {x.insert(i, 0);}
    println!("{}", unsafe{x.len()});
}

fn main() {
    foo(1);
    foo(2);
    foo(3);

    bar(1);
    bar(2);
    bar(3);
}

HashMap::new is not a const fn.

The most common solution is to use once_cell or lazy_static instead of static mut for types like HashMap that cannot be constructed at compile time.

6 Likes

I have never found a once_cell or lazy_static Example that does not require using a mutex. Is there a way around that?

I found out how to do it with once_cell

    static mut x: Lazy<HashMap<u64, u64>> = Lazy::new(|| {HashMap::default()});

But without a mutex can you actually insert anything into it?

This should just be instead:

Lazy::new(HashMap::default)

They can, of course. The other question is - should they, since static mut usage is unsafe for the reason.

2 Likes

oh right I didn't notice the mut there. static mut requires an unsafe block to mutate as far as I remember.

static mut does not enforce Send + Sync. Make sure to wrap your maps in Arc<RwLock<T>> or Arc<Mutex<T>>

1 Like

You don't need an Arc in a static though, at most you'd need it to interact with other code that does require it.

1 Like

If something is static, it is (potentially) globally accessible, and thus, shared. That means that all the accesses will need to happen through a shared reference (&_): safe unchecked exclusive accesses are thus out of the question.

Should you wish to mutate the static, you have three options:

  • Assert that the mutations only happen at initialization time, since the creation of a static happens in an exclusive context (either at compile time, through a const fn / const expression constructor, or at runtime, through some special / magical mechanism, such as lazy_static!, or the more explicit (but with the exact same functionality, once_cell)).
    Once the value has been created, the static is then immutable and thus safe to share everywhere (including across threads ⇒ Sync).

  • Use a mechanism for safe shared mutation, called shared mutability or interior mutability, by wrapping the value within one such wrapper.

    • Single thread (?Sync)
      Assert that the mutations will never happen across multiple threads, by forbidding any kind of access whatsoever to the same static from multiple threads: using thread_local!( statics. This enables all the shared mutability wrappers, even those that are !Sync (not Sync), such as Cell, for small values (such as integers), or RefCell:

      thread_local! {
          static X: RefCell< HashMap<u64, u64> > = HashMap::new().into();
      }
      X.with(|it| {
          let mut x = it.borrow_mut();
          x.insert(i, 0);
          println!("{}", x.len());
      });
      
    • Multi-threaded (Sync)
      This exposes the static to parallel accesses, thus some synchronization mechanism is required to make it sound. You thus need to use a shared mutability wrapper that does feature such a mechanism, such as RwLock (or Mutex).

      lazy_static! {
          static ref X: RwLock< HashMap<u64, u64> > = HashMap::new().into();
      }
      let mut x = X.write().unwrap();
      x.insert(1, 0);
      println!("{}", x.len());
      

      Note that an acceptable "alternative" is to use some special HashMap type that features this synchronization within its internals, so as to offer a more efficient implementation, such as DashMap or ConMap (you will notice these data structures offer &self-based mutation APIs while remaining Sync / thread-safe).

  • Or assert that you don't want any of the Rust safety guarantees, since "you know better"

    ⚠️☠️Danger Zone ☠️⚠️

    That's the only case where you might be able to use static mut, or its saner alternative,

    use lib::UnsafeRacyCell;
    mod lib {
        pub
        struct UnsafeRacyCell<T> /* = */ (
            UnsafeCell<T>,
        );
    
        impl<T> UnsafeRacyCell<T> {
            pub
            const
            fn new (value: T)
              -> Self
            {
                Self(UnsafeCell::new(value))
            }
    
            pub
            unsafe // Safety: must not be called concurrently!
            fn with_mut_unchecked<R> (
                self: &'_ Self,
                ret: impl FnOnce(&'_ mut T) -> R,
            ) -> R
            {
                ret(&mut *self.0.get())
            }
        }
    
        unsafe // Safety: no non-unsafe API whatsoever
            impl<T : Send> Sync for UnsafeRacyCell<T>
            // where T : Sync,  /* To be added if `with_ref_unchecked` were added */
            {}
    }
    

    And then use it:

    unsafe // Safety: must not be called in parallel / simultaneously by multiple threads
    fn bar (i: u64)
    {
        static X: RacyUnsafeCell< HashMap<u64, u64> > = {
            RacyUnsafeCell::new( HashMap::new() );
        }
        X.with_unchecked_mut(|x| {
            x.insert(i, 0); // Drops the previous value (if any). If it had arbitrary drop glue, it could re-entrantly call `bar` !
            println!("{}", x.len());
        });
    }
    
    • Safer pattern
      X.with_unchecked_mut(|x| {
          let prev = x.insert(i, 0);
          println!("{}", x.len());
          prev
      }); // <- dropped here, after the exclusive area ends, which avoids the re-entrancy in the critical section
      
1 Like

I want to do this with a little performance overhead as possible to match some comparable C code. My idea was to use to use static mut in a small module that lets me read or update a value without exposing the data structure. So long as I don't return references I don't see how I could get in trouble (obviously this only works in a single threaded context. otherwise I would need to use some concurrency primitive.) But you are correct that I am giving up some static safety guarantees and need to careful. But there seems to be no other way to have a zero overhead global data structure. I know the "rust way" is to pass everything to every function always, but it has made my code much less readable and more painful to write when you have to pass a bunch extra data structures in every function call.

Sooo what's wrong with once_cell::Lazy, then?

yeah, what's wrong with the once_cell mutex approach then? even if you pass it around you'll need the mutex, and I don't think once_cell has much of a performance overhead

what's wrong with once_cell::Lazy , then?

Nothing. It has zero performance overhead so it it is exactly what I was looking for. However if you combine that with a Mutex or thread_local Then that is no longer true.

Alright, but then we are back at square 1. It is fine to have a static global immutable table of data. But mutating a global is already an anti-pattern, even in C, even in a single-threaded context.

Perhaps you should tell us what the higher-level goal is that you are trying to achieve (and by that, I mean something higher level than "it is annoying to have to pass one more argument"), so we could perhaps help redesign your current architecture.

3 Likes

for the purpose of teaching myself I am trying to build a byte-code VM based on the one in Crafting Interreters. However that was written in C and I am trying to do it in Rust. I want to get feature parity and performance parity. This means that I need access to the stack, global variables, global functions, string intern array, garbage collection grey stack, objects, etc all over the program. In his example he uses a global VM so he has access to all those resources through out his program. I was having a real hard time trying to convert this to rust because of the recommendation against globals. I tried to create a vm struct and pass that everywhere but the problem comes when I need to access different fields for different functions with different mutability the borrow checker was not happy. I am sure a lot of this comes from my lack of experience with idiomatic rust (I have a C/C++ background), and I am not expecting a solution here. Hopefully as work on this problem a more "safe" solution will become apparent.

FWIW, hashbrown 0.9's with_hasher is a const fn now, so maybe std will follow suit. I guess the challenge is still finding a hasher that you can const-initialize.

1 Like

I usually tackle issues like this by making a new struct, turning the extra data into its fields and making the functions method calls for that struct. Not sure if it's applicable for your case, but I thought I'd throw it out there regardless.

4 Likes

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.