Multi-Key Hash Map

There is an interesting stack overflow thread which I attempted to answer a while ago asking about creating a Multi-Key hash map in Rust:

I think this should be possible to made safe in a 0 cost manner but I am also somewhat of a Rust noob, so maybe not. I have an answer in the above thread which is wrong but have tried to fix it, is the following a valid solution to this problem? If not, under what scenario could it break?

use std::cmp::Eq;
use std::hash::Hash;
use std::collections::HashMap;
use std::mem::ManuallyDrop;

pub struct MultiKeyHashMap<K1, K2, V>
where K1: Eq + Hash,
      K2: Eq + Hash {
    map: HashMap<(K1, K2), V>,
}

impl<K1, K2, V> MultiKeyHashMap<K1, K2, V>
where K1: Eq + Hash,
      K2: Eq + Hash {
    pub fn insert(&mut self, k1: K1, k2: K2, v: V) -> Option<V> {
        self.map.insert((k1, k2), v)
    }

    pub fn contains_key(&self, k1: &K1, k2: &K2) -> bool {
        unsafe {
            let k1_ptr = (k1 as *const K1).cast::<ManuallyDrop<K1>>();
            let k2_ptr = (k2 as *const K2).cast::<ManuallyDrop<K2>>();
            let key = (k1_ptr.read(), k2_ptr.read());
            self.map.contains_key((&key as *const(ManuallyDrop<K1>, ManuallyDrop<K2>)).cast::<(K1, K2)>().as_ref().unwrap_unchecked())
        }
    }

    pub fn remove(&mut self, k1: &K1, k2: &K2) -> Option<V> {
        unsafe {
            let k1_ptr = (k1 as *const K1).cast::<ManuallyDrop<K1>>();
            let k2_ptr = (k2 as *const K2).cast::<ManuallyDrop<K2>>();
            let key = (k1_ptr.read(), k2_ptr.read());
            self.map.remove((&key as *const(ManuallyDrop<K1>, ManuallyDrop<K2>)).cast::<(K1, K2)>().as_ref().unwrap_unchecked())
        }
    }

    pub fn with_capcity(cap: usize) -> Self {
        Self{map: HashMap::with_capacity(cap)}
    }
}

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn it_works() {
        let mut m = MultiKeyHashMap::<String, String, i32>::with_capcity(10);
        m.insert("hello".to_owned(), "world".to_owned(), 20);
        let (k1, k2) = ("hello".to_owned(), "world".to_owned());
        assert!(m.contains_key(&k1, &k2));
        assert_eq!(m.remove(&k1, &k2), Some(20));
        assert_eq!(m.remove(&k1, &k2), None);
    }
}

This is unsound. K1/K2 can have internal mutability and mutate themselves when eq/hash is called. With your code this will mutate the copies in key, and the original won't be updated, leaving it with an invalid state.

If you want to implement something like then consider using hashbrown, its Equivalent trait should allow this.

2 Likes

How about this:

impl<K1, K2, V> MultiKeyHashMap<K1, K2, V>
where K1: Eq + Hash,
      K2: Eq + Hash {
    pub fn insert(&mut self, k1: K1, k2: K2, v: V) -> Option<V> {
        self.map.insert((k1, k2), v)
    }

    pub fn contains_key(&self, k1: &K1, k2: &K2) -> bool {
        unsafe {
            let k1_ptr = (k1 as *const K1).cast::<ManuallyDrop<K1>>();
            let k2_ptr = (k2 as *const K2).cast::<ManuallyDrop<K2>>();
            let key = (k1_ptr.read(), k2_ptr.read());
            let res = self.map.contains_key((&key as *const(ManuallyDrop<K1>, ManuallyDrop<K2>)).cast::<(K1, K2)>().as_ref().unwrap_unchecked());
            (k1 as *const K1 as *mut K1).copy_from((&key.0 as *const ManuallyDrop<K1>).cast::<K1>(), 1);
            (k2 as *const K2 as *mut K2).copy_from((&key.1 as *const ManuallyDrop<K2>).cast::<K2>(), 1);
            return res;
        }
    }

    pub fn remove(&mut self, k1: &K1, k2: &K2) -> Option<V> {
        unsafe {
            let k1_ptr = (k1 as *const K1).cast::<ManuallyDrop<K1>>();
            let k2_ptr = (k2 as *const K2).cast::<ManuallyDrop<K2>>();
            let key = (k1_ptr.read(), k2_ptr.read());
            let res = self.map.remove((&key as *const(ManuallyDrop<K1>, ManuallyDrop<K2>)).cast::<(K1, K2)>().as_ref().unwrap_unchecked());
            (k1 as *const K1 as *mut K1).copy_from((&key.0 as *const ManuallyDrop<K1>).cast::<K1>(), 1);
            (k2 as *const K2 as *mut K2).copy_from((&key.1 as *const ManuallyDrop<K2>).cast::<K2>(), 1);
            return res;
        }
    }

    pub fn with_capacity(cap: usize) -> Self {
        Self{map: HashMap::with_capacity(cap)}
    }
}

There are safe ways to accomplish this with std's HashMap too.

More explanation / a longer example.

1 Like

Why include an unsafe implementation on SO? The safe implementation you posted there looks good, and has exactly the same API and performance as the unsafe version, as far as I can tell.

The other safe implementation either:

use dyn which has runtime cost.

use a double hashmap which has runtime cost.

It's trivially simple to implement this with runtime cost, I am interested in a 0 runtime cost solution.

I was asking about your safe impl, not the others, but now I see: your safe impl returns the keys. I didn't see that difference at first.

1 Like

Your approach isn't 0 cost either. The keys you read could be arbitrarily large.

And your update didn't solve the problem. In fact, it added unconditional UB by writing to memory behind a shared reference.

2 Likes

By 0 cost I mean the solution with the lowest runtime cost. I don't see a way around copying the keys, this is forced in a safe implementation and if Rust allows internal mutability this is the only way. The keys must be copied as you have to form a composite intermediate value to compare against, I don't see any way around that.

I understand this solution isn't safe in some ways, I am more looking at this from an academic perspective. What is the lowest cost solution to this problem? It's not super interesting to solve with dyn and the like. If this isn't it, what is?

In that case I suggest deleting the unsafe solution on SO to prevent use by someone who doesn't understand the situation.

2 Likes

Dynamic dispatch could easily be a lower runtime cost than some large copies.

Copying or something similar is the only way to get a &(K1, K2) from a &K1 and &K2, but it's not the only way to solve the stated problem, as demonstrated.

The most straightforward and sound way to be able to create a &(K1, K2) would be to put Clone bounds on K1 and K2.

There is no sound way for the &(K1, K2) approach without imposing limitations somehow.

4 Likes

Sure, I'll add a disclaimer

I am skeptical that dyn is a good option here. It doesn't make sense to use super-large keys in a map anyways...

If I understand the segfault from your playground, 1 and 2 are allocated in some non-mutable memory section which is why it segfaults. This could be fixed by forcing the function to take in &mut T, but that's not very elegant IMO.

And for Clone, this seems like it may require over-copying. For instance, I should be able to only temporarily copy a string header if the keys are two strings. Clone would definitely solve it, but again make it non-zero cost.

Returning the keys is probably the most elegant solution I can think of, but it's not terribly elegant.

If you want to write back to the location, you need &mut anyway. This approach could be made sound. Basically, a replace_with like approach.

Have you actually checked it is? LLVM often elide dynamic dispatches if it's obviously trackable statically like this case.

4 Likes

Have you tested this? It fails with the same example I previously posted.

It's somewhat evading the question, but when I need some "advanced" usage pattern for HashMap, I just use hashbrown instead. std::collection::HashMap is just a thin wrapper of hashbrown::hash_map::HashMap with less features exposed anyway.

2 Likes

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.