Implementing hash for a set datatype (i.e. multiple representations for Equal things)

Dear all,

I've implemented a Set datatype as a Vec. Equality is then defined as two Vecs having the same elements, but in possibly ordered differently in the Vecs. Elements in a set cannot be (totally) ordered, hence Equal sets cannot be written uniquely.

For example, if the element type is Prod(usize,usize) where
Prod(1,2) partial_cmp Prod(2,1) is None
then Set(Vec![Prod(2,1),Prod(1,2)]) partial_cmp Set(Vec![Prod(1,2),Prod(2,1)]) is Equal.

So far, so good. Now I'd like to have a HashMap of Sets, and thus require Set to be hashable.
Clippy rightly complains that I shouldn't derive Hash with my own PartialEq.

This is the obvious (wrong) hasher since it does not satisfy a == b => hash(a) == hash(b)

pub struct Set {
  set: Vec<Prod>
}
impl Hash for Set {
  fn hash<H: Hasher>(&self, hasher: &mut H) {
    for s in &self.set { s.hash(hasher); }
  }
}

I really can't think of an implementation of hash, other than the empty (very inefficient) hasher.

impl Hash for Set {
  fn hash<H: Hasher>(&self, hasher: &mut H) { }
}

I must be missing something?

Thanks,
Kees

One strategy would be to calculate the hash of each item independently and XOR them all together (or any other commutative+associative operator). I haven't looked at how Hashers operate enough to give advice about how to implement this, unfortunately.

2 Likes

I assume you'd create a Hasher for each item, call write and finish on each to get a u64, then XOR the u64s together and hash the result with the provided Hasher.

Anyway, just because the items don't implement Ord doesn't mean you can't provide an order for them for hashing. If you can find a way to assume a canonical ordering of elements, that should be a bit simpler than trying to detect which subsets of data can commute with each other.

2 Likes

Why don't you use the std::collections::BTreeSet<T>? It does impl Hash if T: Hash.

One of the things that you are missing is that you are not following the forum guidelines for posting code. Please read Forum Code Formatting and Syntax Highlighting and edit your initial post by using the pencil-shaped edit button under that post. Thanks. :clap:

In your example you could have defined a total order, it won't probably make sense, but you potentially could. Is this also the case for the actual code? If yes you can use that total order, just make sure it isn't exposed.

1 Like

Thanks everyone.

I have a type hierarchy containing Void (aka Bottom), Rational, Bounded & Infinite integers, Product with custom Orderings (strict, lexicographic, etc.), Sets of types, etc.
Therefore, in general the types and elements of those types are only partially ordered, and not totally ordered.
Representations are not nec. unique (as in the given example of a Set), and cannot be written in general into a canonical form.

The suggestion of a commutative & associative hashing function is excellent.
I'll either hash the sum of the elements or XOR the hashes of all elements.
Update: works perfectly, see code below.

Thanks,
Kees

fn hash_vec<T: Hash>(v: &[T]) -> u64{
  let mut xor = 0u64;
  for i in v.iter() {
    let mut h = DefaultHasher::new();
    i.hash(&mut h);
    xor ^= h.finish();
  }
  xor
}
impl Hash for NEveryMPoset {
  fn hash<H: Hasher>(&self, hasher: &mut H) {
    hasher.write_u64(hash_vec(&self.set))
  }
}
impl Hash for Budget {
  fn hash<H: Hasher>(&self, hasher: &mut H) {
    let h = 
      match self {
	  Budget::Void => { hasher.write(b"Void") },
 	  Budget::Product(s, o, p) => { s.hash(hasher); o.hash(hasher); p.hash(hasher); },
	  Budget::Budgets(s) => { hasher.write_u64(hash_vec(s)) },
	  Budget::Invert(b) => { b.hash(hasher); hasher.write(b"invert"); },
	  Budget::Property(s) => s.hash(hasher),
	  Budget::Bool(s, i) => { s.hash(hasher); i.hash(hasher); },
	  Budget::BoundedInf(s, i) => { s.hash(hasher); i.hash(hasher); },
	  Budget::Rational(s, i) => { s.hash(hasher); i.hash(hasher); },
	  Budget::NEveryMPoset(s, i) => { s.hash(hasher); i.hash(hasher); },
	  Budget::Set(_, s) => { hasher.write_u64(hash_vec(s)) },
      };
    println!("hash({}) = {}",self, hasher.finish()); // for demonstration purposes
    h
  }
}

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.