I am unsure about the behavior of the update function on the digest::Digest trait. Is it guaranteed that passing in the data in different chunks still results in the same result? I would expect it to be that way, but I didn't find any confirmation for this in the docs. Instead, the first section of the standard library Hasher trait doc mentions that we should not assume that
yes, exactly. I saw the remark in the standard library about the Hasher trait and then was wondering what digest::Digest's guarantees are on the update method.
It's not actually spelled out in the trait documentation, which feels like an oversight, but the various algorithms the trait is designed to abstract over all provide consistency of the sort you're asking about. For example, the sha256 digest of a stream must be the same if the sequence of bytes in the stream are the same, without regard for how they're conveyed from the stream to the Digest implementation. The same is true for any other cryptographic digest in serious use.
use sha2::Digest;
fn main() {
let a = sha2::Sha256::default();
let a = a
.chain_update([1u8])
.chain_update([2u8])
.finalize();
let b = sha2::Sha256::default();
let b = b
.chain_update([1u8, 2u8])
.finalize();
assert_eq!(a, b);
}
Not only does this assertion succeed, but it must succeed, or the SHA-256 implementation is incorrect.
Hasher is trying to solve a very different problem, of computing hashes for hash tables and hash-based data structures out of compound values. In that context, the difference between arguments is meant to correspond to whether the value appears in one field or another, and it is both expected and allowable for the resulting hash to differ even if the concatenated inputs do not.
Note that it’s not just expected, but desirable, for the purposes of Hasher. It’s important that there aren’t patterns of values that are unequal but predictably hash the same. For example, if you’re hashing (String, String)s, you don’t want it to be the case that
all have the same hash value. This is as much a part of Rust’s HashDoS resistance as the hash function having a randomized seed is.
One way to look at it is that Hasher’s job is not to take the bytes/numbers/characters you give it and feed them to a hash function, but to serialize the data structure you give it and feed those serialized bytes to a hash function. Not literally any particular serialization format, but similar in that it cares about differences in the shapes of the data, by way of including lengths or delimiters.
But when you are hashing data where the hash is part of some network protocol or data format, as digest is meant for, it’s important that everyone agrees exactly what the input to the hash function is, so the input is defined to be a sequence of bytes and only the bytes influence the hash math, and if you want to hash a structure that is not a sequence of bytes, you have to define what serialization to use so that everyone can agree on that part too.