How to design a API to insert two values into a HashMap at the same time and get two immutable references to these values, while avoid two hashmap queries for each value?

Hi there,
I'm writing some logic involving insert two value to a HashMap with two different keys, and return two immutable references to these inserted value stored in the HashMap.

The most intuitive solution is like this:

let mut map = HashMap::new();
map.insert(1, 100);
map.insert(2, 200);

(map.get(1).unwrap(), map.get(2).unwrap())

However, the problem is that for each key, we performed two hashmap queries actually (1 for insert and 1 for get). Although an additional query is tolerable in my current work, I'm just wondering if it's possible to save this additional query. Using Entry API is obviously non-applicable, because HashMap::entry requires a mutable borrow to self, and return reference get via entry would extend the lifetime of the mutable self borrow, and the final result is that we are trying to create two mutable borrows to a HashMap at the same time which would be rejected by the borrow checker. Besides, runtime interior mutability like RefCell seems not work here.

I understand that Rust HashMap use quadratic probing instead of linked list buckets to handle hash conflicts and so insert a value might trigger rehashing, so even I hold an immutable reference to the first inserted value, the second insertion might invalidating the former reference, and that's why such operation is not allowed by borrow checker via single mutable reference rule. But if avoiding the additional query really matters, can we only write a HashMap by ourselves or if there is any crate implemented such functionality?

The Rust hash map uses hashbrown for its implementation. The hashbrown crate provides some additional functionality, though I'm not sure if it has what you're looking for.

1 Like

This might be helpful: https://docs.rs/hashbrown/latest/hashbrown/struct.HashMap.html#method.get_many_key_value_mut

2 Likes

Having some IndexMap often makes your life easier.

let mut map = IndexMap::new();

let (i1, _) = map.insert(1, 100);
let (i2, _) = map.insert(2, 200);

(
    map.get_index(i1).unwrap().1,
    map.get_index(i2).unwrap().1,
)
1 Like

That should be insert_full to get the new indexes.

Direct usize indexing works to get the value too, so I would write:

let mut map = IndexMap::new();

let (i1, _) = map.insert_full(1, 100);
let (i2, _) = map.insert_full(2, 200);

(&map[i1], &map[i2])
2 Likes

You could:

  1. Make your first insertion and take the resultant first & reference, convert it to a *const raw pointer, and store it in a variable.
  2. Make your second insertion and store the resultant second & reference in a variable.
  3. Somehow do some sort of run-time check that makes sure that the second insertion wouldn't have changed the memory location of the first value such as by re-hashing or whatever.
  4. Use an unsafe block to convert your *const raw pointer back to a & reference.

This obviously relies on implementation details of HashMap and triggers UB if 3 goes wrong, but may be the easiest way to do the thing closest to what you're trying to do.

However, I would probably just use an IndexMap, or a custom structure, or perhaps even a fork of hashbrown instead.

1 Like

Maybe could accomplish 3. by instead first calling reserve.

But that's not sufficient to guarantee that the locations won't move. You'd have to look at the implementation.

It also might end up being worse performance too.

2 Likes

Why not store the values in Arc/Rc, clone those then store the clones in the hashmap and return the Arc/Rc

2 Likes

I think the best you can do with an ordinary hashmap::HashMap<K, V> is to skip recomputing the hash using the HashMap::raw_entry() and HashMap::raw_entry_mut() APIs:

use hashbrown::HashMap;
use std::hash::{BuildHasher, Hash};

pub fn insert_two<K, V, S>(
    map: &mut HashMap<K, V, S>,
    (k1, v1): (K, V),
    (k2, v2): (K, V),
) -> (&V, &V)
where
    S: BuildHasher,
    K: Copy + Eq + Hash,
{
    let hasher = map.hasher();
    let hash1 = hasher.hash_one(&k1);
    let hash2 = hasher.hash_one(&k2);
    map.raw_entry_mut()
        .from_key_hashed_nocheck(hash1, &k1)
        .insert(k1, v1);
    map.raw_entry_mut()
        .from_key_hashed_nocheck(hash2, &k2)
        .insert(k2, v2);
    let v1 = map
        .raw_entry()
        .from_key_hashed_nocheck(hash1, &k1)
        .unwrap()
        .1;
    let v2 = map
        .raw_entry()
        .from_key_hashed_nocheck(hash2, &k2)
        .unwrap()
        .1;
    (v1, v2)
}

Of course, you still need to keep around a copy of each key to compare it for equality with the original, in case of a hash collision.

4 Likes

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.