How to add a static map function in struct

In C++ I can write something like the below, where struct ReferenceData serves as a "global" source of data. It defaults to whatever is given by get_default_map(). But users have the option to override it, if they need to.

#include <iostream>
#include <unordered_map>

struct ReferenceData {
	using map_t = std::unordered_map<std::string, std::string>;

	static void set(const std::string& key, const std::string& data) {
		ReferenceData::get_map()[key] = data;
	}

	static std::string get(const std::string& key) {
		return ReferenceData::get_map().at(key);
	}

	static void reset(const std::string& key, const std::string& data) {
		ReferenceData::get_map() = ReferenceData::get_default_map();
	}

private:
	static map_t& get_map() {
		static map_t ref_data = ReferenceData::get_default_map();
		return ref_data;
	}

	static map_t get_default_map() {
		return {{"USD","NYC,UTC-5"}, {"GBP","LDN,UTC+0"}, {"JPY","TOK,UTC+9"}};
	}
};

int main()
{
	std::cout << "JPY = " << ReferenceData::get("JPY") << "\n"; // TOK,UTC+9
	ReferenceData::set("JPY", "TYO,UTC+10");
	std::cout << "JPY = " << ReferenceData::get("JPY") << "\n"; // TYO,UTC+10

	return 0;
}

How does one implement this pattern in Rust? I have tried something like the below, but I am already failing at the get_map() function implementation. Because

use std::collections::HashMap;

pub struct ReferenceData;

impl ReferenceData {
    fn get_map() -> &'static HashMap<String, String> {
        static ref_data: HashMap<String, String> = HashMap::new();
        &ref_data
    }
}

gives this compiler error:

error[E0015]: calls in statics are limited to constant functions,
              tuple structs and tuple variants
 --> src\ref_data.rs:7:51
  |
7 |         static cal_map: HashMap<String, String> = HashMap::new();
  |                                                   ^^^^^^^^^^^^^^

Ideally, I don't want to use any external crates (like lazy_static). Unless it's the only way. Is it possible to replicate the same C++ pattern in Rust?

On C++, static variables are lazily initialized on first access, only if it's defined within some function body. In Rust, static variables have same semantics wherever it's defined, and the language itself doesn't require any synchronization mechanism to access the variable. The lazy_static crate implements the synchronized lazy-initialized static variable similar to the one in C++. Another popular implementation is the once_cell crate, which have more flexible interface.

You can always copy-paste its implementation, definitely not recommended though.

2 Likes

In the C++ example, the initialization of your static happens during runtime, but before main is executed. However, Rust does not allow any user code to run before main, so you have to either initialize the static at compile time (using a const function) or provide some means of initializing it at runtime before first use.

As indicated by the error, HashMap::new() is not (yet) a const function, so that rules out compile-time initialization, so I think lazy_static (or equivalent) would probably be your best bet to initialize a static HashMap for now.

1 Like

There is a nightly-only experimental lazy module in std, which I understand is based on the once_cell crate, so there may eventually be an easy way to do this without any other libraries.

That's the case for static variables not in the block scope. Declaring it within the function body makes it static local variable which has similar(different if the initializer panic) semantics.

https://en.cppreference.com/w/cpp/language/initialization#Static_local_variables

It's not so much about not wanting to use external crates, but rather wanting to understand how to solve this problem. If I blindly use lazy_static or once_cell I haven't learned anything. If that makes sense. :slight_smile:

So I used cargo expand to get the macro expansion of

use std::collections::HashMap;
use lazy_static::lazy_static;

pub struct ReferenceData;

impl ReferenceData {
    fn get_map() -> &'static HashMap<String, String> {
        lazy_static! {
            static ref REF_DATA: HashMap<String, String> = {
                let mut m = HashMap::new();
                m.insert("USD".to_string(), "NYC".to_string());
                m.insert("GBP".to_string(), "LDN".to_string());
                m
            };
        }
        &REF_DATA
    }
}

which is

pub fn get_map() -> &'static HashMap<String, String> {
    struct RefData {
        __private_field: (),
    }
    static REF_DATA: RefData = RefData {
        __private_field: (),
    };
    impl core::ops::Deref for RefData {
        type Target = HashMap<String, String>;
        fn deref(&self) -> &HashMap<String, String> {
            #[inline(always)]
            fn __static_ref_initialize() -> HashMap<String, String> {
                {
                    let mut m = HashMap::new();
                    m.insert("USD".to_string(), "NYC".to_string());
                    m.insert("GBP".to_string(), "LDN".to_string());
                    m
                }
            }
            #[inline(always)]
            fn __stability() -> &'static HashMap<String, String> {
                static LAZY: ::lazy_static::lazy::Lazy<HashMap<String, String>> =
                    ::lazy_static::lazy::Lazy::INIT;
                LAZY.get(__static_ref_initialize)
            }
            __stability()
        }
    }
    impl ::lazy_static::LazyStatic for RefData {
        fn initialize(lazy: &Self) {
            let _ = &**lazy;
        }
    }
    &REF_DATA
}

A few questions:

  1. Am I correct in that LAZY.get(__static_ref_initialize) calls the initialising function __static_ref_initialize() only once, and then stores the result in a Cell smart pointer? And in each call, it returns whatever is stored in the smart pointer?

  2. Is struct RefData just a "front"? Because when using &REF_DATA i don't actually get a reference of RefData but rather a completely new type, i.e. &HashMap<String, String>?

  3. Why are __private_field and impl ::lazy_static::LazyStatic for RefData required? If I comment them out, everything still works.

  4. How do I make my HashMap mutable? I tried inserting mut in a few places, like pub fn get_map() -> &'static mut HashMap<String, String> and &mut REF_DATA and so on. But the compiler is not happy.

As far as I understand, it will never be one, as initializing a HashMap requires generating a random seed for hashing. And random data isn't possible in const fn, since those are supposed to be deterministic. Looking at the hashbrown crate (the underlying implementation of the standard library HashMap), you can see that they do provide const fn construction only when you provide the hasher manually via with_hasher, which can be a custom type of hasher, or even the default BuildHasher in case you choose to use fixed predetermined seed via the const fn with_seeds.

4 Likes

If you want to understand the once_cell would be better to investigate, mainly because it doesn't use macro.

The question is "How do I safely mutate global variable between threads?", and the answer is to use Mutex<T>.

2 Likes

Looking at the current source of that get method we can see that the only place it calls f is inside a closure it passes to std::sync::Once:call_once, which, according to that documentation, indeed only calls the passed closure once and only once.

The struct is more or less a "front". Note the Deref trait, which is called whenever the given reference is dereferenced, (besides some weird casting shenanigans, probably). So, pedantically, you are getting a reference to the instance of the RefData struct, but when deref is called you get a &HashMap<String, String>.

My understanding is that those are there to prevent things that could result in bugs, if not outright UB, otherwise. For example, making another &RefData in another thread would cause issues.

The last time I remember using lazy_static I ended up using a Mutex. I don't know if there's a more performant way that is still safe, if you know that there will be no other threads for example, but it was fine for my use case. (And I did in fact end up using multiple threads later, so trying to avoid the small Mutex overhead would not have been worth it.)

1 Like

In this case, one is probably looking either for std::thread_local, or for something like Fragile. The difference is what we got if the multiple threads eventually appear: thread-locals are created in each thread independently, while fragile values will crash the program with a panic if used from non-creator thread.

2 Likes

I am trying to get this to work with once_cell, but I am struggling a bit. As per https://docs.rs/once_cell/latest/once_cell/#lazy-initialized-global-data I tried this:

use std::{sync::Mutex, collections::HashMap};
use once_cell::sync::Lazy;

pub struct ReferenceData;

impl ReferenceData {
    fn get_map() -> &'static Mutex<HashMap<String, String>> {
        static REF_DATA: OnceCell<Mutex<HashMap<String, String>>> = OnceCell::new();
        REF_DATA.get_or_init(|| {
            let mut m = HashMap::new();
            m.insert("USD".to_string(), "NYC".to_string());
            m.insert("GBP".to_string(), "LDN".to_string());
            Mutex::new(m)
        })
    }

    pub fn get(key: &str) -> Option<&String> {
        ReferenceData::get_map().lock().unwrap().get(key)
    }
}

But I get:

error[E0515]: cannot return reference to temporary value
  --> src\ref_data.rs:19:9
   |
19 |         ReferenceData::get_map().lock().unwrap().get(key)
   |         ----------------------------------------^^^^^^^^^
   |         |
   |         returns a reference to data owned by the current function
   |         temporary value created here

How would I fix this?

You can't return long-lived references to something behind a mutex (without holding on to the lock some place). Possible approaches include:

  • taking a callback in get and passing the &String to the callback instead of returning it. (The mutex will stay locked for the duration of the callback, which may or may not be what you want.)
  • crating an owned copy of the String by cloning it. (I. e. return String instead of &String from the get method.)
  • in order to avoid cloning overhead while still passing out "owned values", you could use HashMap<String, Arc<String>> and return clones of the contained Arc<String> from the get method (cheap to clone, no allocations or copying the data needed, will just update a reference count.)

Is there any way to do this without a mutex?

Do you have an example? I tried

    pub fn get(key: &str) -> &String {
        let var = ReferenceData::get_map().lock().unwrap().get(key).unwrap();
        return var;
    }

But this doesn't compile (returns a value referencing data owned by the current function). So I assume this is not what you meant.

I was on mobile earlier, otherwise I might’ve given one anyways. I was talking about something like

    pub fn with<F, R>(key: &str, callback: F) -> R
    where
        F: FnOnce(Option<&String>) -> R,
    {
        callback(ReferenceData::get_map().lock().unwrap().get(key))
    }

Users would need to use a callback, so instead of

let value = ReferenceData::get("...");
let something = do_something_with(value);
let something_else = do_something_else_with(value);

something like

let (something, something_else) = ReferenceData::with("...", |value| {
    let something = do_something_with(value);
    let something_else = do_something_else_with(value);
    (something, something_else)
});

or

let (something, something_else) = ReferenceData::with("...", |value| {
    (
        do_something_with(value),
        do_something_else_with(value),
    )
});

(and this way of passing data back out of the callback like this would only work if the types of e.g. something and something_else do not contain any lifetimes depending on the lifetime of the &String)


The problem is that you said you want your map to be mutable. If it wasn’t, then a method like fn get(key: &str) -> Option<&'static String> wouldn’t be a problem. If it is mutable, then you cannot offer any method that could be used to obtain a &'static String reference into the map. Holding an immutable reference into the map means it cannot be mutated while holding the reference; and since a caller might hold onto the reference indefinitely, you couldn’t mutate the map at all anymore after a single “get” call.

You can also expose the mutex guard if you like, or offer your own guard object that’s wrapping it.


On an unrelated note: If mutation is rare, and exclusive locking a common immugable use-case seems too harsh, using RwLock might be a reasonable alternative. (This wouldn’t help at all with the lifetime problems of a get method like yours.)

1 Like

This makes sense and I, actually, recommend to look on what C++ compiler does for you (I pushed get_map and get_default_map into separate functions to make it easier to understand code).

As you can see C++ compiler does for you things which Rust crates are doing. Only it's part of the language and couldn't be changed, but the idea is the same: there's guard variable and initialization is happening just once and then the same value is returned again and again.

The only difference are destructors: C++ calls them while Rust implementation doesn't but in practice this C++ behavior is widely considered to be a mistake in design.

Really appreciate your answer, but I am kind of lost now. I think I just don't know enough about how callbacks or the with function work. Quite the can of worms.

In case the fact I wrote with as a method confused you, I’ve fixed that. I’ve just referred to the fn with the definition of which I provided above.

The reason such a function can “return” a &String to a closure is that the mutex guard, locking the mutex, still exists further up on the stack while the callback is executed. And a callback F: FnOnce(&String) -> R is actually generic, i.e. a shorthand for for<'a> F: FnOnce(&'a String) -> R, which ensures/enforces that F is able to work with an arbitrarily shortly lived reference to the String.

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.