I have a program where over 70% of my time is spend hashing tiny objects, like u64, (u32,u64), (u64, Wrapping<u64>).
I do not need cryptographically strong hashes, just "reasonable" ones (in particular, I wouldn't want the hash function for u32 or u64 to just be the identity function).
It seems like (from looking at valgrind's cachegrind output) the amount of code produces for hashing a u64 is significant. I tried a couple of alternative packages, like seahash, but even then if I do something like the following to hash a single u64, the amount of instructions generated is huge. Is there a better way of hashing small objects (other than writing some small custom hash functions), or could I be misinterpreting the profiles I'm seeing?
pub fn do_hash<T>(obj: T) -> Wrapping<usize>
where
T: Hash,
{
let mut hasher = seahash::SeaHasher::default();
obj.hash(&mut hasher);
Wrapping(hasher.finish() as usize
}
I agree that BTreeMap is a good datastructure, and some people reach for HashMaps too quickly, however in my case I'm not actually using the hashes for hash maps, but for something else.
I don't want to get into details (if I can avoid it). I'm implementing this academic paper: [1911.04783] Permutation group algorithms based on directed graphs (extended version) , which basically comes down to hashing LOTs of things, adding up lists of the resulting hashes (I realise adding hashes isn't a usual thing to do either!), and seeing if they are equal.
So, it seems fxhash was the right choice. In particular it has a function hash64 which, if I also do some link time optimisation, seems to lead to all the hashing getting inlined, leading to a half dozen CPU instructions. Now my hashing is only taking 15% of my runtime, so I'll go look at other things to optimise