Using custom hash function

Hi,

I have a struct that looks like this:

struct SomeWrapper {
    hmap: HashMap<usize, HashSet<usize>>,
}

I also have this:

impl SomeWrapper {
    pub fn from_range(range: std::ops::Range<usize>) -> SomeWrapper {
        SomeWrapper {
            hmap: range.map(|x| (x, HashSet::new())).collect()
        }
    }
}

Rust's default hashing function is too slow for my purposes. I already did my research and I can make Rust use a custom Hash Function (type HashMapSea<K, V> = HashMap<K, V, BuildHasherDefault<SeaHasher>>; and then using that as the new type), but what I really want is to be able to pass a custom hash function from the outside and use that then. I want this to be able to test different hash functions.

Also, since the methods new and with_capacity aren't available with a custom hash function, how do I emulate that?

Would be glad if someone could help! :slight_smile:

You can make your functions generic over types implementing BuildHasher to allow passing in any hasher to use with a HashMap.

The functions with_hasher() and with_capacity_and_hasher() are the functions that correspond to new() and with_capacity() but allow you to pass in a custom BuildHasher.

2 Likes

Hi, I'll try that out later or tomorrow. I think this will solve my issue. Thanks for the quick answer. I'll report back then.

Hi, I actually just tried it. Sorry for the stupid questions, this is my second week using Rust.

I now get this error:

no method named contains_key found for struct std::collections::HashMap<usize, std::collections::HashSet<usize>, H> in the current scope

method not found in std::collections::HashMap<usize, std::collections::HashSet<usize>, H>

note: the method contains_key exists but the following trait bounds were not satisfied:
H: std::hash::BuildHasherrustc(E0599)
graph.rs(63, 29): method not found in std::collections::HashMap<usize, std::collections::HashSet<usize>, H>

This is my current implementation

struct SomeWrapper<H: BuildHasher> {
    hmap: HashMap<usize, HashSet<usize>, H>,
}

Initialization works fine now, but I can't call certain methods on HashMap now. I tried changing the type to SomeWrapper<H: BuildHasher + Hash> where Hash is std::hash::Hash, but that didn't work either.

I figured it out.

Actually, I didn't figure it out. I can only get it to work with the default std::collections::hash_map::RandomState. But when i try to use the custom use seahash::SeaHasher, it tells me it doesn't implement DefaultHasher. I'm probably using the wrong types or misunderstood something.


This part describes my second problem from when I thought I figured it out:

I was using the wrong signature for the impl. It has to be:
impl<H: BuildHasher> SomeWrapper<H>.

But now I'm facing the following problem: every time I call an associated function from the outside, I have to add a type. I tried adding a default type:
pub struct SomeWrapper<H: BuildHasher = RandomState>

But I still have to call SomeWrapper methods using:
SomeWrapper::<RandomState>::associated_function(...).

Ideally that would just be:
SomeWrapper::associated_function(...)

This would also spare me the redundant use RandomState. Is this possible?

It should be able to infer the types for associated functions if there is enough information. Note that the example in the documentation for with_hasher() doesn't need to annotate the type in the call. It may be helpful to see some specific examples of the types of associated functions you're using.

I'm using this function for example:

pub fn from_file<P: AsRef<Path>>(path: P) -> Result<Graph<RandomState>, Box<dyn std::error::Error>> {
    // do some parsing
    let mut graph: Graph = Graph::<RandomState>::from_range(1..num_nodes + 1); // it complains here about needing the ::<RandomState>
    // some more parsing
}

The idea is to add a second associated function from_file_with_hasher(...) that allows a custom Hasher. The default is also supposed to be seahash::SeaHasher or some Hasher I created, but I can't seem to get it to work unless I use std::collections::hash_map::RandomState. Right now I'm using BuildHasher as the trait of the generic type, which is probably wrong. But when I use use std::hash::Hasher as the trait of the generic type, I get errors like

no method named insert found for struct std::collections::HashMap<usize, std::collections::HashSet<usize>, H> in the current scope

and similar errors for other methods that HashMap usually implements.

I assume Graph is the type that you are implementing here. I think the problem is that the default for the generic applies only to the struct itself, not the functions on it. It is possible that adding the default to the impl block may help, but you probably want a pattern more like that for HashMap in std, where the type is fixed in the impl block for the methods using the default (e.g. new()), and supplied as a generic parameter for anything that has a hasher passed in (e.g. with_hasher()) and anything that takes the HashMap as a self parameter. In both of those generic cases, there is type information coming from the arguments (either from a hasher or self).

BuildHasher is right since it's the type bound for HashMap methods. It allows for the hasher to be reinitialised as needed. If you have something that implements Hasher and Default, then you can get a simple BuildHasher using BuildHasherDefault.

1 Like

Oh, perfect. Thank you for your patience! Finally got it.

You were right, I had to split the impl block. Now I finally grasp why the syntax is impl<T> NewType<T>. But in this case one of the blocks is impl NewType<DefaultType> (I added a type alias for the DefaultType.) Also had to add the type Default as another trait bound. I learned a lot and everything works flawlessly now!


If hope you don't mind, because I'd have two last conceptual questions. Am I right that HashSet::with_hasher(DefaultHasher::default())) and HashSet::default() are essentially the same?

Also, it seems like BuildHasher is only there to provide some initial state for the hashing algorithm? So I could implement a custom BuildHasher instead of relying on the BuildHasherDefault which is really just a default value for the initial state of the Hasher (but I could customize that with a custom implementation of BuildHasher. Hash just implements how a particular type is hashed, Hasher then uses those values in the actual hashing algorithm.

1 Like

That is almost the implementation of HashMap::default(), except that the actual implementation is generic over the hasher (technically, HashBuilder) and just requires it to also implement Default.

BuildHasher provides a way to generate instances of a Hasher. It allows for doing things like generating a new random seed for each hasher, but it could also return a hasher with the same intial state each time, or just increment a counter and use that to initialise the hasher, or keep returning references to the same hasher, or something else (not all of these are necessarily advisable though). BuildHasherDefault is for instances of Hasher that have their own standard way of constructing them that takes no arguments (represented by the Default trait), without really requiring the support of a BuildHasher. Hash specifies how a type is to be fed to the hasher. Hasher then does the actual hashing.

1 Like

Thank you for taking your time with this detailed answer and good explanation! I've got it now. Really learned something.

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.