Trait proposal: hash::ToHash


#1

Generating hash values is sort of awkward;

let foo = 1;
let mut hasher = SipHasher::default();
foo.hash(&mut hasher);
let hashed_foo = hasher.finish();

It would be much nicer if we could just call 1.to_hash(). Another limitation of the hash::Hash trait is that it makes hash memoization difficult. Hashing large objects can be an expensive operation; it would be nice if we could build a wrapper object that precomputes the hash on creation and then uses the precomputed hash each time its needed.

Here’s what I’m proposing:

trait ToHash<H>: PhantomFn<H> {
    fn to_hash(&self) -> u64;
}

The idea is to to provide a method that returns an object’s hashed value. The trait could be implemented for any type that implements Hash and any Hasher that implements Default:

impl<T> ToHash<SipHasher> for T where T: Hash {
    fn to_hash(&self) -> u64 {
        let mut hasher = SipHasher::default();
        self.hash(&mut hasher);
        hasher.finish()
    }
}

Aside from simplifying hash generation, the trait would enable hash memoization, for instance the following wrapper could be used;

struct Hashed<T,H> {
    hash: u64,
    value: T,
    marker: PhantomData<H>,
}
impl<T,H> Hashed<T,H> where T: Hash, H: Hasher + Default {
    fn new(value: T) -> Hashed<T,H> {
        let mut hasher = <H as Default>::default();
        value.hash(&mut hasher);
        Hashed {hash: hasher.finish(), value: value, marker: PhantomData}
    }
}
impl<T> ToHash<SipHasher> for Hashed<T, SipHasher> {
    fn to_hash(&self) -> u64 {
        self.hash + 1
    }
}

The impact on the rest of std would be fairly minimal; the trait could replace Hash in a few places where memoization would be useful. For instance, HashSet could be changed to look something like:

pub struct HashSet<T, H> where T: ToHash<H>, H: Hasher { ... }

This would allow Hashed<T,H> instances to be added to multiple sets without recomputing the hash each time.

Sound useful to anyone?


#2

No it wouldn’t. Not in general at least. Hashers have state. In the case of SipHash that’s something like a 128 bit random key. If you tried to use the hashes from one HashSet in another it probably wouldn’t work. The memoized hashes you insert the value under wouldn’t match a lookup with an equivalent key that isn’t memoized. Your HashSets would become blackholes where values enter but can never be found.