Hashmap/hashset not implemented Hash?

#[derive(PartialEq, Eq, Hash)]
pub struct Foo {
    a: HashMap<i32, i32>,
    b: HashSet<i32>,}

This code does not compile for me. Is the situation (1) I'm doing something stupid or (2) there is a valid reason why HashMap HashSet does not implement Hash (even when both key & value implement Hash) ?

1 Like

The order in which hash tables store their elements is unspecified in general, but hashing order affects the final hash value. This could result in two equal hash tables producing different hashes, which is inconsistent with the requirements and definition of hashes. Therefore, hashing containers can't implement Hash themselves.

4 Likes
  1. I agree with this logic.

  2. IIRC there are Lisp implementations that (1) support hashmap / hashsets and (2) allow arbitrary Lisp values as the key (including hashmap / hashset). Unless I am mis-remembering things, is there a way to make this work ?

I think it's absolutely possible to implement Hash. Java does it by summing the hash values of all entries, Rust could do the same. Rust does implement Eq for hash maps.

Why it's not done I don't know. My guess is nobody has implemented it, or it's deemed to be too much of a corner use case (it is really expensive).

1 Like

Here's a prior attempt:
https://github.com/rust-lang/rust/pull/48366

3 Likes

Is there a simpler implementation of GitHub - rust-lang/hashbrown: Rust port of Google's SwissTable hash map without all the optimizations ? I have seem the CPP Con talk on SwissTable and am looking for a minimal implementation w/o all the optimizations.

It couldn't with the current performance guarantees. For that, you would need to generate the hash of each element individually, which might be a lot slower than creating a single state and hashing the elements into it.

What are "the current performance guarantees"? Hash isn't implemented currently so there's nothing to compare against.

It's not about a particular implementation, it's about how Hash and Hasher are supposed to work together. The idea is that for hashing a collection, a single Hasher is created, then each element is hashed into its state. Re-creating a Hasher might thus multiply the time required for hashing if its construction is expensive. It might well be – for example, the default hasher for HashMap uses a high-quality, DoS-resistant random seed, and obtaining such a seed may be slow (I don't know the specifics of that, though).

In any case, you can use a BTree* collection, which implement Ord + Hash, instead of Hash*.

2 Likes

Note that that's RandomState::new() that does that, not the BuildHasher implementation, so wouldn't be an issue for implementing Hash for a HashMap.

(That said, I'm definitely skeptical of even using Eq on a HashMap, let alone Hash.)

AFAICT, even worse: re-creating the hasher is simply not possible in the API. When you implement Hash, you'll need to be generic over Hasher. You get passed a single mutable reference to a single Hasher, no way to make more out of that. Correct me if I'm wrong, I think all you could do is choose your own favorite, fixed implementation of hasher and use that to completely sidestep the intended hashing infrastructure, then pass the resulting hash value to the original hasher that you were supposed to use. But that's a terrible idea IMO, the caller of the Hash::hash function will assume that they choose the hasher and its properties.

2 Likes

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.