I'm looking at https://crates.io/search?q=checksum and surprised the highest seems to be at 25k download. (Was expecting something in the millions).
I need to do some checksumming. The threat model is not adversial (I dont need cryptographical hash function). The threat model is hardware failure / corruption.
Any recommendations ?
Check out: https://crates.io/crates/digest. It contains instructions on which crate to use for which checksum type. The category to look for is "Cryptography" (https://crates.io/categories/cryptography). https://crates.io/crates/sha2 has 27M downloads.
You might want to search for "hash" or "hasher" and shop around.
Do Hash functions actually work? I.e. I think checksums can make guarantees of the form: if < N bits are flipped, we detect it.
Do Hash functions actually guarantee this ?
Some hash functions have such guarantees, other do not. For example, there's a special kind of hash function called a checksum that tends to have such a guarantee
When it comes to cryptographic hash functions, they probably don't have exactly that guarantee, but on the other hand they tend to be so unpredictable that not even a single collision has ever been found, even if there in principle could be a collision where only a single bit differs.
(For example, SHA-256 has 256 bits of output, which means that it can output only 2^256 different values, so if you consider a string of length 2^256+1 bits, then the string is guaranteed to have a collision with some other string that differs by only one bit.)
The point is that "hash function" is not strong enough to guarantee the condition we want; and perhaps, there is a better name (i.e. 'checksum') that provides the property we want.
xxhash with 128bit output would be more than enough, and also simple to use
Note that "checksum" to me implies something very different. Checksums are made to be good at detecting a certain kind of perturbation. For example, in computers they're often designed to detect the kinds of errors that are common in the physical layer of communication. On your credit card, the checksum is designed to catch off-by-one errors and digit transpositions that are commonly made by humans.
They're thus very good at being different for certain kinds of small differences, but don't directly say anything about the distribution of their outputs nor worry about their behaviour for very-different inputs. For example, in a good cryptographic hash function all the output bits have about a 50% change to change for a 1-bit difference in the input, but in a checksum it's plausible for certain input bits to only affect certain output bits, since one guaranteed change is sufficient for the corruption-detection goal -- and arguably better than just a probabilistic chance of a bunch of flips, for that use. That has implications like truncating a cryptographic hash is a reasonable operation, since the entropy is reliably spread across all the bits, but truncating a simpler hash can do far more damage than just the birthday bound impact. (This is why hash tables that use modulo-prime instead of modulo-power-of-two are more resilient to poor hash functions, for example.)
Now, a good one will have used its flexibility well so is likely to be unbiased in the distribution of its output, but only incidentally, not in a planned way.
I appreciate your effort, but I don't understand your point.
My point is: checksum != hashing; the two have different design goals. In hashing, we want to avoid collisions / even make it hard to compute pre-image.
checksums, given the 'threat model' is hardware, not adversial, have no such requirements
To me, "hashing" (without further qualification) isn't adversarial, nor is it concerned with pre-image difficulty. For example,
rustc is basically just a huge hash table benchmark, and doesn't care about pre-images and doesn't worry about adversarial input.
This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.