Hashing with a salt

I'm wrapping an existing C library that provides a hash function, enif_hash.

It takes a salt argument. I gather this helps prevent denial-of-service attacks where the attacker passes in many values that have the same hash code. The attacker doesn't know the salt.

However as this comment says, there's no obvious way to get a salt out of a Rust hasher. I think this means I don't get the benefit of either the enif library's collision protection, or whatever collision protection is built into the Hasher—so users are vulnerable!

I think this code should use Hasher::finish() to get some bits for the salt. Am I right? Or is that cryptographically risky?

The comment is about the Hash trait, which is a user of an already-salted hasher.

When you implement the Hasher trait, you control the salt yourself.

let mut enif = EnifHasher::new_with_salt(salt);
123i32.hash(&mut enif);
let salted_hash = enif.finish();

So the Hash implementation can't access the salt, but it shouldn't need to, because it merely extracts the data for you to perform hashing in your hasher.

I'm not sure I follow. What are you saying we should do?

The purpose of the patch is to support using Term values as HashMap keys and so forth. So we need to impl Hash for Term.

Term is a thin Rust wrapper around ERL_NIF_TERM, an opaque type provided by the C library. Since it's opaque, we can't efficiently hash it ourselves. enif_hash is the only option.

I know the Hasher is already salted. But that doesn't help! I hope this is clear. Suppose enif_hash returns the same hash for term1 and term2 when the salt is 0. Then this code will feed the same data to the Hasher for those two terms. No Rust Hasher can distinguish them, no matter how it is salted.

Reading this, I can't quite tell whether you're trying to implement both Hash and a custom Hasher, or just Hash.

Without making any strong statements about cryptographic security, here's what I can tell about enif_hashs salt:

  • enif_hash delegates to make_internal_hash. All other uses of that function in erlang/otp (where it is defined) use a salt of 0, so it is basically only there as a feature for downstream users of enif_hash.
  • enif_hash fully drives the hash computation by walking the erlang Term. This is presumably why you want to extract the salt. The salt is used as the initial value of the computed hash before any bytes from the term are added.

The Hash trait is not for hashing. It's only there to supply raw unsalted unhashed data. This is very different from hashCode() in Java, for example. It seems to be the opposite of enif_hash too.

The place where actual hashing and salting takes place, is the Hasher trait which is deliberately outside your control from the perspective of the Hash trait. The hasher is given raw bytes from hash-agnostic Hash implementations, and hashes them with a salt if it wants to.

  • If you can access underlying data of the Term, then do so, and write all of its data into state, using as many calls as you need.

  • if the Term interface doesn't give you access to its data, only already computed hash of its data, then don't worry about the salt. Whatever you do in your Hash implementation will be used as an input for the supplied Hasher.

1 Like

Sure, ideally Hash implementations pass raw, unhashed data to the hasher.

But a Hash can feed the Hasher anything consistent with Eq, even nothing at all. Of course, the less data you give the Hasher, the worse the hash will be. (If there's more to the contract, it should be documented. As it stands, it looks like implementations are given the slack to make tradeoffs in cases like ours.)

The value computed by enif_hash is consistent with Eq, precise enough, and dramatically faster than walking the term from Rust, so we're going with that. The resulting hash will be as good as enif_hash or the Hasher, whichever is worse.

A hash composed with another hash, although inefficient, should not be really worse than just the inner hash.

So yes, just provide whatever arbitrary salt you want (usually 0 in that case), so as to get from the returned FFI hash some form of data / bytes from your struct. Then, in your impl Hash, you feed those bytes to the Hasher. Provided the FFI hash is decent, it should be fine.

The only way it's worse is that the inner hash is capable of salting, but (in the current patch) we aren't actually salting.

This defeats the outer hasher's salting. If two terms have the same inner hash, the outer hasher can't distinguish them.

That's why I'm considering using hasher.finish(): to get a deterministic value that incorporates the outer hasher's salt, suitable for use as a salt for the inner hash.

Okay. I think I see the problem you are trying to describe:

Suppose that an attacker has somehow prepared a large repository of [u8]s that all produce the same hash value when fed to DefaultHasher::new(). I.e.

use std::collections::hash_map::DefaultHasher;
use std::hash::{Hash, Hasher};

fn unsalted_hash<T: Hash>(value: &T) -> u64 {
    let mut hasher = DefaultHasher::new();
    value.hash(&mut hasher);
    hasher.finish()
}

// Let's just assume that this is true:
assert_eq!(
    unsalted_hash(b"Hello world!"),
    unsalted_hash(b"pwned"),
);

Theoretically, if it existed, such a repository could be used to perform a DOS attack by filling a hash map with collisions. However, the repository would take a very long time to build, and would thus have to be prepared ahead of time.

Rust's random initialization prevents this attack, by making the initial state of the hasher unpredictable. With the hasher in a different state, the keys already prepared by the attacker will most likely no longer produce collisions for any arbitrary HashMap. That is to say:

let init = RandomState::new();
let salted_hash = |s: &[u8]| {
    let mut hasher = init.build_hasher();
    hasher.write(s);
    hasher.finish()
};
    
assert_ne!(
    salted_hash(b"Hello world!"),
    salted_hash(b"pwned"),
);

Now... suppose we had a type like this:

struct Unsalted<T>(T);

impl<T: Hash> Hash for Unsalted<T> {
    fn hash<H: Hasher>(&self, hasher: &mut H) {
        hasher.write_u64(unsalted_hash(self));
    }
}

One can see that, regardless of the salt, collisions in unsalted_hash produce collisions in the outer hasher:

assert_eq!(
    salted_hash(Unsalted(b"Hello world!")),
    salted_hash(Unsalted(b"pwned")),
);

meaning the precomputed table of collisions is once again viable. To turn this example into the author's problem, imagine that Unsalted holds an Erlang term, and unsalted hash() simply calls enif_hash with a salt of 0.

Of course, this all hinges on the existence of such a table. I am not an expert in crypto, and I have no idea what kind of resources would be necessary to produce an n-way collision for a cryptographically-secure 64-bit hash function. But if enif_hash has any cryptographic weaknesses, the aforementioned table could be easier to construct, and might even already exist.

Is that right?


I don't really feel qualified to comment on your solution using finish.

That's right.

enif_hash uses this code which looks like Bob Jenkins's 1997 hash. It was not designed to be cryptographically secure. Most hash functions aren't. If the attacker knows the salt, it's easy to produce lots of collisions. (If you like mathy programming puzzles, try it!)

I am not 100% sure that using a random salt would protect against this, but I expect it does.

1 Like

Yep, that's it. As I (not very clearly) said before:

Which was a soft way to say: the composition of two hashes cannot be stronger than the inner one.

So, if the inner hash is bad, as @jorendorff points out, it is indeed a bad idea to use that as input of the outer hash;

I guess that a lazy_static random seed could be used to generate a salt for enif_hash so that feeding it to the Hasher were not that bad; this way we would get the consistency of Hash + Eq while not being vulnerable to precomputed collisions.

1 Like