Are Hash and Hasher outputs guaranteed to be stable between rust versions?

Author of the rapidhash crate here. I'm wondering if it's possible to offer persistent hashing through std::hash::Hasher, so that the hash output is stable between platforms, rust, and crate versions?

The way I read the std::hash::Hash docs infers that the standard library hashing traits are not suitable for persistent hashing. The last breaking change I'm aware of is pre-1.0 though, with the addition of hashing a string suffix byte.

The line in the std::hash::Hash docs "or only rely on Hash and Hasher implementations that provide additional guarantees" is ambiguous to what those guarantees are, and might even be misleading, as I don't think the Hash trait (let alone implementors) can provide enough guarantees for a Hasher implementation to offer it? Even if write_str was stable, my understanding is there are guarantees on the API, but no guarantees the default implementation isn't free to change in future rust versions.

I'd like to be able to give the crate user base a persistent version of RapidHasher for ease of use, but if that's not possible, a clear way to explain why I can't would be helpful. Thank you!

The problem is that even if a Hash implementation is stable, Rust might still change what data a given type feeds to the hasher. For example, maybe a (i32, i32) starts providing the integers in the opposite order in a new rustc version.

7 Likes

What's the confusion here? If one controls both the concrete Hash impls and the Hasher then one controls in exactly which order data is fed to the latter and then what output they will produce together. And the implementations may choose to guarantee their exact behavior.
This of course isn't useful when dealing with impls one does not control, but well, those don't provide such guarantees and thus aren't covered by the statement.

It's clear in the docs that the standard Hash impls don't provide such stability:

Due to differences in endianness and type sizes, data fed by Hash to a Hasher should not be considered portable across platforms. Additionally the data passed by most standard library types should not be considered stable between compiler versions.

That's confusing. It's impl Hash for (i32, i32) that decides the order given to the hasher.

2 Likes

You should look into what stable-hash accomplishes[1] and how.


  1. as in, I'm not sure if it gives you the guarantees you want or not ↩︎

This is what concerns me. We could sensibly assume basic types like str, tuples, arrays, Vec, and derive(Hash) on structs are unlikely to change between compiler versions, but the lack of a guarantee means the crate can't provide a guarantee.

Likewise the Hasher trait could change. Hasher::write_str could still break the way strings are hashed by default, and there are discussions to change the default behaviour. I think this is the strongest argument, if basic data types like str will have breaking changes between rustc versions.

or only rely on Hash and Hasher implementations that provide additional guarantees

I think this is misleading in the docs. The implementor can't do anything if we can't even rely on the Hasher trait not to change the default str hashing behaviour between rust versions?

*edit: typo Hash to Hasher, mention potential str hashing changes.

Hash trait may only be changed in a compatible fashion (e.g. it may add new default method).

But stability of values provided by functions in said trait are guaranteed to implementer of these functions, not by trait itself.

Most implementations in the standard library don't provide stability (and that's a good thing!), ergo you need some implementation that are not using these.

But they could still implement the exact same trait, why not?

Uhm. What exactly may prevent it from providing such guarantees for some or, maybe, even all implementations?

How is write_str is ever related to the Hash trait? Sure, if you would try to use some generic implementations that are provided by standard library then you would be in trouble, but that's exactly what this snippet talks about: you can implement Hash for your own types without ever using any implementations provided by the standard library (nothing may stop you, isn't it?) and if you do that then everything should work.

Of course, at this point, it's probably batter to just have another separate trait, not related to the std-provided Hash… but technically this may work.

Apologies, I'm mixing up Hash and Hasher there.

If Hasher::write_str remained an unstable API feature, but changed the default str hashing behaviour to let's say, use write_prefix_len instead, a stable implementation of Hasher is then breaking between rust versions for a basic str type.

I'd consider that non-portable between rust versions, as far as I'm aware there's nothing the Hasher would be able do on stable rust to output the same str hash between rust compiler versions?

You are allowed to override a provided method when implementing a trait, so a theoretical impl Hasher for StableHasher could define its own write_str behavior and guarantee its stability. What it can't guarantee is that str won't, for example, start doing length-prefixing directly in its own Hash impl.

Well yes, it the text says you need to use impls with additional guarantees. impl Hash for str is not one of those. This follows from the previous "not considered stable" statement.

You can write a struct StableHashStr(&str) newtype wrapper with a custom Hash impl for example. Or go through str.as_bytes() which you can pass to your hasher without being at the mercy of the standard library.

So I've been convinced by the above that, while being very careful, it's possible to offer stable and portable hashing between platforms and compiler versions via std::hash::Hash and std::hash::Hasher.

To ensure stable and portable hashing, end users need to:

  • Ensure their chosen Hasher is portable, and promises to be stable between rust and crate versions.
  • Explicitly not use derive(Hash) and implement Hash themselves using Hasher::write_* methods.
  • Avoid using Hash::hash on types they haven't manually implemented, including primitives like str and tuples.
  • Avoid Hasher::write_* methods with default implementations (particularly the upcoming write_str), which requires reading the Hasher implementation source code to check.
  • Iterate manually over any tuples and collections.
  • Be informed on how to construct a hash to avoid reordering or length-extension attacks etc, if required for their use case.

Portable hashing libraries (including rapidhash) could do a better job at documenting how careful users need to be in this case. Although it'd be a long section, and the risk of user mistakes makes me wonder if it's a bad idea to offer portable hashing through the standard types at all.

I've drafted a portable-hash crate that intends to provide PortableHash and PortableHasher traits (plus a derive macro in future) that promise stability between platform and compiler versions. Hopefully this would be safer and easier for end users of portable hashers. All thoughts on the idea are welcome.

1 Like

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.