I am wondering the advantage of this implementation over just using sha512 (or some other crypto function resistant to collisions).
It's way faster than SHA-512. Cryptographically secure hash functions are intentionally designed to be slow in order to be harder to break. On the other hand, a hash table hash function needs to be as fast as possible, while only requiring reasonable levels of collision-resistance — it doesn't need to be cryptographically secure.
You should not confuse different hash functions, from Wikipedia :
Hash functions are related to (and often confused with) checksums, check digits, fingerprints, lossy compression, randomization functions, error-correcting codes, and ciphers. Although the concepts overlap to some extent, each one has its own uses and requirements and is designed and optimized differently. The hash functions differ from the concepts numbered mainly in terms of data integrity.
A hashtable's properties aren't defined solely by which hash function it uses. In particular the bitpacking done in "swiss tables" allows for a very efficient memory usage.
This talk, as I remember it, is a really great overview for people (like me or you) who don't know much about the subject.
It's true that non-crypto hashes are typically faster, and a better choice for fast hashmap.
However, the SHA family of hashes is meant to be as fast as possible — within their security limits. Cryptographic hash functions are useful for message authentication, where faster is better. If you need a hash function for password storage, then SHA is not slow enough.
Not to mention that SHA1 and SHA2 are not even considered collision-resistant anymore due to some not-so-recently discovered theoretical attack surfaces, so I would not recommend them for any cryptography-related use. They are still fine for integrity checking against accidental errors, something that a hash table's hash function may still not be strong enough for.
I was under the impression that if the hash function is not collision resistant, one can do an "algorithmic big-Oh DOS attack" by turning a HashMap into a LinkedList by forcing lots of entries into a single bucket.
Assuming this is true, doesn't this imply HashMaps need to be "as collision resistant" as crypto hash functions ?
Collisions are normal and expected for HashMaps, so they don't need a collision-resistant hash function. The only problem is in an edge case where an attacker can cause the majority of keys to hash to the same bucket. The hash function doesn't need to be prefect, only stop the worst case.
In the DoS case the attacker is assumed to be remote, and without ability to directly inspect hash table's internal state. I suppose that this is considered difficult enough to attack that not-terrible hash function is enough.
Only SHA-1 has been broken; SHA-2 including SHA-256 and SHA-512 are fine to use. SHA-256 is for example used by TLS 1.3 (on this website).
The "trick" that Rust's hash map uses with its keyed SipHash is that the key is secret (to an external "attacker"). This makes it hard for the attacker to predict which inputs collide.
The attacker is not supposed to be able generate only colliding inputs - they can't predicts which inputs will collide in the hashmap without the key. If they could, they could run the Hash-DoS attack, and SipHash protects against it.
Without protection: the attacker sends you 1 million strings that they carefully prepared to all hash to the same hash bucket (and you use them in a hashmap). The hash table is loaded very unevenly and has pathological performance.
With the keyed hash: The attacker sends 1 million strings but can't predict which collide; the hash table load is spread out evenly, some keys collide and others don't (in the way it's designed to work).
Good call. I got the requirements of a 'good' HashMap hash function wrong.
There was concern that some of the attacks on SHA1 might be extended to SHA2, which is part of why the SHA3 competition happened. But in the years since that seems not to have happened, so SHA2 is now considered stronger than people worried it was previously.
SHA3 (Keccak) is awesome to have as another option, though. It's a completely different internal structure -- so it's very unlikely that an attack that breaks one will break the other. (And it's a new line of research, different from all the block-cipher-based hashes, which is exciting all on its own.)
Paraphrasing ~ 2:50: Speaker acknowledges that a huge amount of performance of hash table is hash function -- and that this talk is going to glossy over it -- and assume we have a 'good' (for whatever definition) hash function.
This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.