Quick Hashing tiny objects

I have a program where over 70% of my time is spend hashing tiny objects, like u64, (u32,u64), (u64, Wrapping<u64>).

I do not need cryptographically strong hashes, just "reasonable" ones (in particular, I wouldn't want the hash function for u32 or u64 to just be the identity function).

It seems like (from looking at valgrind's cachegrind output) the amount of code produces for hashing a u64 is significant. I tried a couple of alternative packages, like seahash, but even then if I do something like the following to hash a single u64, the amount of instructions generated is huge. Is there a better way of hashing small objects (other than writing some small custom hash functions), or could I be misinterpreting the profiles I'm seeing?

pub fn do_hash<T>(obj: T) -> Wrapping<usize>
where
    T: Hash,
{
    let mut hasher = seahash::SeaHasher::default();
    obj.hash(&mut hasher);
    Wrapping(hasher.finish() as usize
}

If your keys are integers, you could try using a BTreeMap. Skip the hashing entirely.

fxhash is good at hashing small integers

2 Likes

I agree that BTreeMap is a good datastructure, and some people reach for HashMaps too quickly, however in my case I'm not actually using the hashes for hash maps, but for something else.

I don't want to get into details (if I can avoid it). I'm implementing this academic paper: [1911.04783] Permutation group algorithms based on directed graphs (extended version) , which basically comes down to hashing LOTs of things, adding up lists of the resulting hashes (I realise adding hashes isn't a usual thing to do either!), and seeing if they are equal.

1 Like

So, it seems fxhash was the right choice. In particular it has a function hash64 which, if I also do some link time optimisation, seems to lead to all the hashing getting inlined, leading to a half dozen CPU instructions. Now my hashing is only taking 15% of my runtime, so I'll go look at other things to optimise :slight_smile:

5 Likes

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.