Nested enum behaviour of Hashing

I got an example like this:

use std::collections::HashSet;
use std::hash::Hash;
#[derive(PartialEq, Eq, Hash)]
enum Four {
    FourPointOne,
    FourPointTwo,
    FourPointThree,
}
#[derive(PartialEq, Eq, Hash)]
enum Five {
    FivePointOne,
}
#[derive(PartialEq, Eq, Hash)]
enum Foo {
    One,
    Two,
    Three,
    Four(Four),
    Five(Five)
}
fn main() {
    let i1 = Foo::Four(Four::FourPointOne);
    let i2 = Foo::Four(Four::FourPointTwo);
    let i3 = Foo::Five(Five::FivePointOne);

    let mut set: HashSet<Foo> = HashSet::new();
    set.insert(i1);
    if set.insert(i2) == false {
        println!("already have");
    }
    if set.insert(i3) == false {
        println!("already have");
    }
}

In this case, can I safely consider that all elements in all enum types can be hash without problem? Or in another word, can I consider that the hashing behave the same with

//put all enum fields together
#[derive(PartialEq, Eq, Hash)]
enum Bar {
    One,
    Two,
    Three,
    FourPointOne,
    FourPointTwo,
    FourPointThree,
    FivePointOne,
}

?

The automatically derived Hash implementation will behave properly, i.e. different values produce different data that’s fed to the hasher. However the precise value given to the hasher will differ.

The nested version, Foo, will hash the discriminant in Foo, optionally followed by a discriminant for the contained type. I.e.

  • Foo::One feeds [0] to the hasher
  • Foo::Two feeds [1] to the hasher
  • Foo::Three feeds [2] to the hasher
  • Foo::Four(FourPointOne) feeds [3, 0] to the hasher
  • Foo::Four(FourPointTwo) feeds [3, 1] to the hasher
  • Foo::Four(FourPointThree) feeds [3, 2] to the hasher
  • Foo::Five(FivePointOne) feeds [4] to the hasher

whereas for Bar,

  • Bar::One feeds [0] to the hasher
  • Bar::Two feeds [1] to the hasher
  • Bar::Three feeds [2] to the hasher
  • Bar::FourPointOne feeds [3] to the hasher
  • Bar::FourPointTwo feeds [4] to the hasher
  • Bar::FourPointThree feeds [5] to the hasher
  • Bar::FivePointOne feeds [6] to the hasher
3 Likes

What do you mean by "without problem"? They can all be hashed without panicking or UB, if the hasher is written correctly (and the default hasher, used by standard library, can be trusted to be written correctly). There are no guarantees at all what values they will hash to - the hasher can even be pseudo-random, i.e. when you ran the program several times, it can calculate different hashes for the same values, because it, for example, will be seeded at startup with the random value.

I'm afraid this is not guaranteed.

Note however, that this code:

    set.insert(i1);
    if set.insert(i2) == false {
        println!("already have");
    }
    if set.insert(i3) == false {
        println!("already have");
    }

...does not rely on all the i1, i2 and i3 having different hashes, since HashSet checks the equality of values when it finds the collision. And i1, i2 and i3 are clearly not equal under the default (derived) implementation of Eq, since they have different discriminants on some level.

2 Likes

Here’s a straightforward practical demonstration/reproduction of this table: Rust Playground

1 Like

I mean the case that even though there's no Foo::Four(FourPointTwo) in set, but the algorithm considered Foo::Four(FourPointTwo) and Foo::Four(FourPointTwo) are the same, and return false when calling insert

Note that, as @Cerber-Ursi explained above, HashSet doesn’t work with Hash alone, in fact, the Hash is merely an optimization, and the important part is how the (Partial)Eq implementation operates. Hash has to be well-behaved though in that it

  • must produce the same data to be hashed, for values that compare as “equal” using the (Partial)Eq implementation; failing to meet this condition can result in a HashMap or HashSet actually misbehaving and reporting a value is not part of the collection, even though it ought to be.

additionally, a Hash implementation

  • should produce the different data to be hashed, for values that compare as “not equal” using the (Partial)Eq implementation; failing to do this can result in HashMap/HashSet or similar data structures using this type to be terribly inefficient. (There’s actually an even more precise condition regarding “prefix collisions”, see the documentation)

In any case, the interesting question is whether or not Foo::Four(FourPointOne) == Foo::Four(FourPointTwo) is true or false. As you can easily test, it’s false, with the derived PartialEq and Eq implementation you have, so .insert should consider these two values as different and never report "already have" in your example code.

3 Likes

(Non-cryptographic) hashes in general are expected to have collisions, and any hashing data structure must handle eventual collisions. If you implement a hasher that always returns the same value, HashSet and HashMap etc. must and will still work correctly (albeit performance will suffer). This has nothing to do with enums or derive macros – this is just how hashing works.

2 Likes

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.