A lazily-initialized map


#1

I’m looking for a Map datastructure, where values are immutable (and never removed) once inserted, but where new insertions can happen concurrently and while values are borrowed. What is this datastructure called, and is there a rust implementation?


#2

I’m pretty sure it would be trivial to implement this yourself.


#3

The difficulty is overcoming rust’s single mutable borrow rule, as we may need to insert whole other values are read-only borrowed


#4

If you’re fine with a small runtime overhead, you can use Rc to easily overcome that:

use std::collections::HashMap;
use std::rc::Rc;

fn main() {
    let mut map: HashMap<i32, Rc<String>> = HashMap::new();
    map.insert(1, Rc::new("value1".to_string()));
    map.insert(2, Rc::new("value2".to_string()));
    let borrowed = Rc::clone(&map[&1]);
    map.insert(3, Rc::new("value3".to_string()));
    println!("borrowed: {}", borrowed);
}

Playground


#5

I think the most difficult part is to resolve the data race associated with concurrent access and modification of the hashtable index. Once you have this, shared ownership semantics give you support for inserting new data into the map while entries are borrowed “for free” .


#6

Maybe you’re looking for a concurrent hashmap, like https://github.com/jonhoo/rust-evmap or https://docs.rs/chashmap?


#7

I don’t believe a general data structure like this can exist, at least not one with dynamic size.

Inserting into a hashmap with borrowed values is unsafe not only because of overwriting values, but because if the storage runs out the hashmap will dellocate it and reallocate a bigger size ( just like Vec). Even if you don’t change the existing values, you can cause the map to reallocate and invalidate all existing references.

If you were to make this data structure in Rust, I think it would have to be more of a LinkedList of key/value pairs than an actual HashMap.


#8

I’ve thought about this and actually the Rc solution from @Riateche makes the most sense, for me. It would be nice if I could tie their lifetime to the life of the original object, but I’ll leave that to a later refactor.

Thanks everyone for the suggestions, much appreciated!!


#9

You can use a little bit of unsafe to get compiler-checked lifetimes with Box instead of Rc. You also have to use Mutex or RwLock (or RefCell) to permit inserting via a shared reference, but it seems like you want that anyway. Here’s a module with an interface I believe to be safe using RwLock:

mod stable_map {
    pub struct StableMap<K, V> {
        lock: RwLock<HashMap<K, Box<V>>>,
    }

    impl<K: Eq + Hash, V> StableMap<K, V> {
        pub fn new() -> Self {
            Self {
                lock: RwLock::new(HashMap::new()),
            }
        }

        pub fn get<Q: ?Sized>(&self, k: &Q) -> Option<&V>
        where
            K: Borrow<Q>,
            Q: Eq + Hash,
        {
            let map = self.lock.read().unwrap();
            unsafe { map.get(k).map(|r| &*(&**r as *const _)) }
        }

        pub fn try_insert(&self, k: K, v: V) -> Result<&V, ()> {
            let mut map = self.lock.write().unwrap();
            match map.entry(k) {
                Occupied(_) => Err(()),
                Vacant(e) => unsafe { Ok(&*(&**e.insert(v.into()) as *const _)) },
            }
        }
    }
}

The first unsafe is safe because Box gives its contents a stable address. A reference to the contents of a Box will be valid as long as we (1) never use RwLock to delete or mutate items in the map through a shared reference, and (2) disallow internal references to the lock so that external code can’t do those things either.

“Through a shared reference” is important in that paragraph. It’s safe to write a method like HashMap::remove that takes &mut self, and implement it using RwLock::get_mut. It’s not safe to write a remove method that takes &self, because then you could call it while holding a shared reference to the item about to be removed. And obviously you can’t expose a reference to the inner RwLock because that could be used to do the same thing.

try_insert may cause the HashMap to reallocate, but since none of the Boxes are dropped or mutated in the process, reallocating doesn’t invalidate any extant references. The only reason for the second unsafe is to permit returning a reference to the newly inserted value; it’s safe for the same reason get is.

Playground link. I don’t know if StableMap is the right name for this kind of data structure; it might mean something different in the literature.


#10

I’m gonna have a go at segfaulting this - I’ll let you know how I get on xP