Abstract contains method

Dear reader,

I was working on some code where I wanted to generalize over set membership through a sequential scan and hashtable lookup. I tried writing a trait with a contains method but got in trouble when trying to actually use it.

trait CollectionContains<T> {
    fn contains(&self, item: &T) -> bool;
}

impl<T> CollectionContains<T> for Vec<T>
where
    T: PartialEq,
{
    fn contains(&self, item: &T) -> bool {
        <[T]>::contains(self, item)
    }
}

fn plain<'a>(x: &mut Vec<String>, keep: Vec<&'a str>) {
    x.retain(|item| keep.contains(&item.as_str()));
}

fn generic<'a>(x: &mut Vec<String>, keep: impl CollectionContains<&'a str>) {
    //     -- lifetime `'a` defined here
    x.retain(|item| keep.contains(&item.as_str()));
    //        ----  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    //        |     |
    //        |     `item` escapes the closure body here
    //        |     argument requires that `'1` must outlive `'a`
    //        `item` is a reference that is only valid in the closure body
    //        has type `&'1 String`
}

fn main() {
    plain(&mut vec!["a".to_string(), "b".to_string()], vec!["a"]);
    generic(&mut vec!["a".to_string(), "b".to_string()], vec!["a"]);
}

playground

The plain function is just there to show the similarity. I don't understand what is different between the concrete type in plain and the bounds I placed on generic, and if the bounds can be adjusted to match what happens for the concrete type. I tried changing the signature to:

fn generic(x: &mut Vec<String>, keep: impl for<'a> CollectionContains<&'a str>) {

But that leads to the following at the call-site:

error: implementation of `CollectionContains` is not general enough
  --> openapi-processor/src/bin/help.rs:31:5
   |
31 |     generic(&mut vec!["a".to_string(), "b".to_string()], vec!["a"]);
   |     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ implementation of `CollectionContains` is not general enough
   |
   = note: `Vec<&'2 str>` must implement `CollectionContains<&'1 str>`, for any lifetime `'1`...
   = note: ...but it actually implements `CollectionContains<&'2 str>`, for some specific lifetime `'2`

Rather than defining this trait, I could pass in a contains_fn closure. That might be a superior solution anyway, but I can't help but wonder why this approach is hard and how to solve it.

keep: Vec<&'a str> is covariant in 'a, so it can coerce down to a lifetime local to the closure body that you pass to retain. In the generic case, there's no way to exploit variance (my implementing type may not even have a generic parameter but still implement the required bound).

That would normally be the bound you need, but it doesn't work because your implementation for Vec<T> can only check for T specifically.

The problem is, in the closure you only have a lifetime that's shorter than the generic function body, and there's no way for the caller to meet a bound on that local lifetime specifically. When you need some bound related to a local lifetime, the usual fix is to go with the higher-ranked for<'a> bound, and require the implementation to work with all lifetimes.

But that doesn't work with the "can only look up T if you have a T specifically" implementation. Nothing implemented CollectionContains<&'a str> for all lifetimes 'a, they only implemented it for the specific lifetime in their collection.

Perhaps also phrased as: Trait, trait parameters, trait implementations, and trait bounds don't have or consider variance.

IMO you need something akin to how you can look up keys in HashMaps without the specific key type.

Like so. I didn't double-check my bounds approach, just approximated HashMap from memory, so you might want to do the comparison yourself, or compare it to hashbrown's more flexible approach.

2 Likes

Given the explanation by @quinedot (which was not obvious to me at first glance), this also works: Rust Playground

IMHO the error message "borrowed data escapes outside of closure" (and "item escapes the closure body here") doesn't point to the root of the problem very well here. Not that I would know how to improve it.

1 Like

Ah that is a good solution and one I resisted. I felt like Vec<T> should implement CollectionContains<T>, because in the general case I might want multiple collection methods on a CollectionExt<T> trait. With this approach, it would start to look a bit odd if I added a generic add method:

trait CollectionExt<T, Q: ?Sized> {
    fn add(&mut self, item: T);
    fn contains(&self, item: &Q) -> bool;
}

impl<T, Q> CollectionExt<T, Q> for Vec<T>
where
    T: std::borrow::Borrow<Q>,
    Q: ?Sized + PartialEq,
{
    fn add(&mut self, item: T) {
        self.push(item);
    }
    
    fn contains(&self, item: &Q) -> bool {
        self.iter().map(T::borrow).any(|q| q == item)
    }
}

fn plain<'a>(x: &mut Vec<String>, keep: Vec<&'a str>) {
    x.retain(|item| keep.contains(&item.as_str()));
}

fn generic<'a>(x: &mut Vec<String>, keep: impl CollectionExt<&'a str, str>) {
    x.retain(|item| keep.contains(&item.as_str()));
}

See impl CollectionExt<&'a str, str> in particular. You could argue that each method should get its own trait. That would make it clearer what generic parameter is used for what operation. It would also allow the consumer to make any combination. At the same time, they would need to place more trait bounds, and in the extreme case it would necessitate a pseudo trait alias like num_traits provides.

What are your thoughts on this?

That could be like this:

fn generic<T>(x: &mut Vec<String>, keep: impl CollectionExt<T, str>) {

...since generic doesn't care what the actual contents are.

I think there are a number of options with various tradeoffs and no golden solution.

  • If you don't care about dyn compatibility, you could put Q on contains instead of on the entire trait
  • If you keep it on the trait, you could give it a default of T
  • If you make CollectionContains<Q> its own trait, CollectionContains<T> could be a supertrait of Collection<T>
  • You could use the contains in your OP and also have a callback-style contains_by

I don't think there's any getting around the tradeoffs, so the best choice depends on how you want to hold it. There's other tradeoffs with such a broadly targeted trait too, like how does HashMap work with contains and add?[1]

Ultimately I think there is no Collection trait in std[2] because you have to compromise too much to fit all the common collections into the same toolbox trait. Traits end up targeting a more narrow use case than "I own many of something", like "I need to do specific thing".[3]

Those specific things can still be collection-esque in nature, and we do have some of those type of traits, like IntoIterator and Extend. Having actually wrote that down, maybe what I should be suggesting is: do an inventory of the popular collection-related traits[4] that do exist and see if you can find inspiration or common ground. There's at least partially a "trait per method" ecosystem that already exists.[5]


  1. Takes a tuple? Adding (0, 2) would silently eject (0, 1).... ↩︎

  2. or a de factor standard one ↩︎

  3. "Everything I can do with many of something" isn't specific enough. ↩︎

  4. and maybe even common collection inherent methods ↩︎

  5. e.g. Extend::<Item>::extend_one is analogous to add (but Item need not be T, so it's fine for HashMap<K, V>) ↩︎

1 Like