Ownership issue with a static HashMap

Hi everyone,

I am new to Rust. I decided to port a small part of a big C/C++ project into Rust, but I am already having to fight the borrow checker. Here's my (simplified) code:

lazy_static! {
    static ref REGISTRY: Mutex<HashMap<String, String>> = Mutex::new(HashMap::new());
}
pub fn register(name:String, value:String) {
    REGISTRY.lock().unwrap().insert(name, value);
}
pub fn find(name: &str) {
    REGISTRY.lock().unwrap().get(name)
}

When I try to compile, I get this error:

error[E0515]: cannot return value referencing temporary value
  --> src/iocsh.rs:63:5
   |
63 |     REGISTRY.lock().unwrap().get(name)
   |     ------------------------^^^^^^^^^^
   |     |
   |     returns a value referencing data owned by the current function
   |     temporary value created here

I don't understand why it doesn't let me return the Option<&String> as returned by get. My understanding of the compiler message is that the returned Option has a reference to the REGISTRY reference on the function's stack, but that does not make sense to me.

My intent is to have a global (I know - see PS) HashMap that is the owner of the values I insert into it. The values that I insert into the map will not be removed from it. The values in the HashMap will not be modified - I want read-only access to them after they are inserted, and these accesses will not outlive the HashMap. How can I express that intent properly?

I tried using RefCell and Arc in the value type but at that point I was just throwing words at the compiler to see what sticks... So maybe I am missing something conceptual. Maybe it's some unforeseen interaction with lazy_static?

Any help is much appreciated!

PS.: I know globals are bad. My code is being compiled as a cdynlib that will be called from C/C++, and I am mimicking the original code as close as possible. Changing the interface to pass a reference to a HashMap around is not an option...

2 Likes

The lock().unwrap() gives you a MutexGuard, which limits the lifetime of anything you access through it. This will be just a temporary value dropped at the end of your expression, which doesn't let the value from get survive. Try storing that guard in a local first.

At the end of that expression the MutexGuard returned by .lock() will be dropped and the lock is automatically released so you can't access the content behind mutex after this point.

Oh, but you also want to return the borrowed value without holding the lock? Rust doesn't allow this, at least not in safe code. The borrow checker would have to do whole program analysis (which it doesn't) to verify your claim that you'll never modify that value.

If you're really sure of yourself, you could get that value and unsafely mem::transmute to a 'static lifetime.

Even in your simplified example, this could go bad: calling register with the same name twice will drop the old value.

Oh, but you also want to return the borrowed value without holding the lock?

Yes :slight_smile: How does HashMap do it? It clearly owns the value and it is able to return a reference to it. I want find return a value of a type that says: here's a value owned by someone else that can't be modified, only read from, and it will live as long as necessary.

Please don't transmute in this case. Non-local unencapsulated transmute USUALLY result to undefined behavior, which makes compiler explicitly allowed to do whatever it can, including to lauch a missile to you as someone in SO mentioned. This also happens to highly experienced C/++ developer. In fact, it's more likely to happen if you feels comfortable to C/++ programming and such pattern, because unsafe Rust is FAR MUCH MORE UNSAFE than C/++. Safe Rust provides lots of guarantees to the compiler to enable aggressive optimization to make your code faster. And in unsafe context, Rust ASSUMES that you know all those guarantees and all those guarantees are carefully provided by your hand. And by practice, humans cannot provides such guarantees for non-trivial code. Have you heared about the RCE bug in SQLite, which has 13x more test code then actual product code?

4 Likes

If it weren't for the mutex, the map would be static and immutable, and then HashMap::get() could return a statically borrowed value without trouble. But the mutex allows mutation, and all borrows must be limited to the scope of a guard.

If you could arrange to do all of your initialization at once in the lazy_static block, then you wouldn't need the mutex at all.

2 Likes

Variables in Rust are semantically meaningful.

foo().bar() is different from let tmp = foo(); tmp.bar(), because let changes lifetime of foo()'s result from temporary to its containing scope.

RefCell is just a single-threaded Mutex. Same as Rc being single-threaded Arc.

For values that are set once and not modified there are wrappers like once_cell that express this.

6 Likes

One of the ways to solve the lifetime issue is to arrange it so that the logic that uses the "found" value is called within the frame of the find function itself: i.e., instead of returning a reference to a value, you take a closure using the reference and call it:

use ::std::{*,
    collections::HashMap,
    sync::RwLock,
};

type Key = String;
type KeyView = str;
type Value = String;

::lazy_static::lazy_static! {
    static ref REGISTRY: RwLock<HashMap<Key, Value>>
        = RwLock::new(HashMap::new());
}

pub
fn register (name: Key, value: Value)
{
    REGISTRY
        .write().expect("HashMap lock was poisoned")
        .insert(name, value)
    ;
}

pub
fn find<Ret, F : FnOnce(Option<&Value>) -> Ret> (
    name: &'_ KeyView,
    f: F,
) -> Ret
{
    f(
        REGISTRY
            .read().expect("HashMap lock was poisoned")
            .get(name)
    )
}

fn main ()
{
    register("key1".into(), "42".into());
    find("key1", |mb_value| {
        dbg!(mb_value);
    });
}
1 Like

Thanks for all the help, folks! In the end this is what I came up with:

lazy_static! {
    static ref REGISTRY: RwLock<HashMap<String, Arc<String>>> = RwLock::new(HashMap::new());
}
pub fn register(name:String, value:String) {
    REGISTRY.write().unwrap().insert(name, Arc::new(value));
}
pub fn find(name: &str) -> Option<Arc<String>> {
    REGISTRY.read().unwrap().get(name).map(|v| Arc::clone(v))
}

It might be heavy handed but I think it is correct. Performance is not a concern here. Of course, I kicked the can down the road because later I will have to throw this pointer over the ffi fence, but since I am never getting rid of the HashMap contents, a cast from Arc to a raw C pointer will not be harmful, hopefully.

2 Likes

Nice, that is a good solution indeed.

Minor note: Arc<String> means the string bytes are behind double pointer indirection; you can Arc::from(s.into_boxed_slice()) to get type Value = Arc<str>; instead. Insertions would be costlier, though (going from Box<str> to Arc<str> requires copying the bytes).

3 Likes

I see. In my actual use case, I am not using a String as the value type, but rather a custom struct. Do you think I still need some form of into_*? I am still trying to wrap my head around the many transformations available.

Nah, then Arc<Struct> is fine :slight_smile:

1 Like

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.