Using hashset.contains with tuple types without takeing ownership of the values

When you are storing a tuple of values, you can only call contains by passing a reference to a tuple that own the values.

fn contains(s: &HashSet<(i32,i32)>, a: &i32, b: &i32) -> bool {
    s.contains(&(*a, *b))
}

To avoid the copy to accept tuples.

fn contains(s: &HashSet<(i32,i32)>, v: &(a: i32, b: i32)) -> bool {
    s.contains(&v)
}

The issue is when you are doing a API and both the tuple and HashSet are some internal implementation you don't want to leak. I basically have to re-rewrite mostly of the code if I want to test if can optimize with a HashSet with tuple.

Do we have any trick to deal with it? Maybe a custom implementation of Partial or Eq?

For context: My case I was playing with a grid of rooms and portals between rooms. So user can check if aportal exist giving 2 coordinates, I would just check a contains in a hashmap with a custom struct that Eq,Ord,Hash of (a,b) == (b,a).

Rust playground:

Why are you passing references to integers? Just use them by value. And I wouldn't worry about constructing a tuple on the fly. It's always cheap.

fn contains(s: &HashSet<(i32,i32)>, a: i32, b: i32) -> bool {
    s.contains(&(a, b))
}

@RustyYato i32 is is just a simplification. You can assume to be a string, matrix or some complex type that is not so cheap.

  1. Create a trait Query with methods exposing the data you want to search by:
    trait Query {
        fn a(&self) -> &i32;
        fn b(&self) -> &i32;
    }
    
  2. Implement Hash, PartialEq and Eq for dyn Query. When you do this it is important that the implementations are consistent with Hash and PartialEq of your value type, (i32, i32). For tuples this is easy. For more complicated types you may need to create a custom wrapper for the value type so that you can make the implementations agree exactly.
    impl std::hash::Hash for dyn Query + '_ {
        fn hash<H: std::hash::Hasher>(&self, hasher: &mut H) {
            self.a().hash(hasher);
            self.b().hash(hasher);
        }
    }
    impl std::cmp::PartialEq for dyn Query + '_ {
        fn eq(&self, other: &Self) -> bool {
            self.a() == other.a() && self.b() == other.b()
        }
    }
    impl std::cmp::Eq for dyn Query + '_ {}
    
  3. Implement Query and Borrow<dyn Query> for your value type, (i32, i32).
    impl Query for (i32, i32) {
        fn a(&self) -> &i32 { &self.0 }
        fn b(&self) -> &i32 { &self.1 }
    }
    impl<'any> std::borrow::Borrow<dyn Query + 'any> for (i32, i32) {
        fn borrow(&self) -> &(dyn Query + 'any) {
            self
        }
    }
    
  4. For each additional type you want to be able to look up in the map, implement Query.
    impl Query for (&i32, &i32) {
        fn a(&self) -> &i32 { self.0 }
        fn b(&self) -> &i32 { self.1 }
    }
    
  5. Finally, when you use set functions like contains and get that take a generic &Q, specify dyn Query.
    s.contains::<dyn Query>(&(a, b))
    // or: s.contains(&(a, b) as &dyn Query)
    

Playground.

Credit for the idea to @kennytm on Stack Overflow.

This approach has several potential performance issues which you should be aware of. First and most obviously, it's a total waste to do this with cheaply copied types like i32. It would be more reasonable with strings or more expensive types. Even so, .a() and .b() are indirect calls and you have to make at least 6 of them to look up a member of the set if it's present (2 if it's not). You can reduce the number of virtual calls by designing Query more intelligently. Devirtualization is possible, but in my experience LLVM hardly ever devirtualizes anything in Rust, so I wouldn't count on it.

If the value type is not feasible to clone, this might be a good approach. If performance is a bottleneck, you should profile this solution against other options like HashSet<(Cow<i32>, Cow<i32>)> or using a different data structure entirely.

2 Likes

Another option is to use the raw entry api with no overhead, you can do this on stable via the hashbrown crate. Note: std hashmap/set is implemented in terms of hashbrown.

3 Likes

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.