Implementing filter on all generic iterators

I'm not new to programming by any means but am pretty new to Rust. I really like certain things about the language, but it seems whenever I try to do something slightly more advanced I run into issues. I'm finding that I do not yet know how to read the rust error messages and docs like I can in (i.e. C++ or Typescript).

I was hoping to assert my own understanding of how to use the language by implementing a filter_none on iterables that contain options.

To start off--I decided to see if I can do what I want to begin with:

fn filter_none<'a, T>(item: &'a &Option<T>) -> bool {
  match item {
    None => false,
    Some(_) => true,
  }
}

fn main() {
  let vec: Vec<Option<i32>> = vec![None, Some(1), Some(2), None, Some(3)];
  let filtered_vec: Vec<&Option<i32>> = vec
    .iter()
    .filter(filter_none)
    .collect();
  println!("{:?}", filtered_vec);
}

This works. Now to implement this as a trait on iterables I attempted to add the following code:

use std::slice::Iter;
use std::iter::Filter;

fn filter_none<'a, T>(item: &'a &Option<T>) -> bool {
  match item {
    None => false,
    Some(_) => true,
  }
}

trait FilterNone<T> {
  // Have it seamlessly return a filter object
  fn filter_none<P>(self) -> Filter<Self, P>
    where
      Self: Iterator<Item = Option<T>> + Sized,
      P: FnMut(&Option<T>) -> bool,
    {
    // apply filter_none as a filter and return
    self.filter(filter_none::<'a, T>)
  }
}

// Implement FilterNone on all iterables of Option<T>
// pass along lifetime for filter_none
impl<'a, T> FilterNone<T> for Iter<'a, Option<T>> {}

fn main() {
  let vec: Vec<Option<i32>> = vec![None, Some(1), Some(2), None, Some(3)];
  let filtered_vec: Vec<&Option<i32>> = vec
    .iter()
    .filter_none()
    .collect();
  println!("{:?}", filtered_vec);
}

After many iterations I seem to be unable to properly match the type of the Filter callable with either a static function or a closure. I imagine it might have something to do with a missing lifetime but I'm clearly missing something.

It would be very helpful for me if someone could help explain what's missing to accomplish what I'm trying to do and why. Thank you.

Normally when defining an extension trait (a trait that adds methods to an existing trait/struct) you'll define a new trait with that method, then put the implementation of that trait in some impl<T, I: Iterator<Item=T>> FilterNone<T> for I { ... } block.

use std::iter::Filter;

// to make the filter_none() return type more readable.
type IsSome<T> = fn(&Option<T>) -> bool;

trait FilterNone<T>: Sized {
    fn filter_none(self) -> Filter<Self, IsSome<T>>;
}

impl<T, I> FilterNone<T> for I
where
    I: Iterator<Item = Option<T>> + Sized,
{
    fn filter_none(self) -> Filter<Self, IsSome<T>> {
        self.filter(|item| item.is_some())
    }
}

fn main() {
    let items = vec![None, Some(1), Some(2), None, Some(3)];

    let somes: Vec<_> = items.into_iter().filter_none().collect();
    println!("{:?}", somes);
}
2 Likes

In your current example, the problem is that within fn filter_none, you have a generic parameter P for the filter predicate function. When you have a generic parameter, it means that the caller can choose what concrete type to use.

In this case, the caller can't actually choose because there is only one type being used and it is determined by the function body. It's the type of the function you're passing to filter().

Since in your situation, the following things are true:

  • There is exactly one concrete type you want to use;
  • You want the caller to know it implements a trait;(namely FnMut); and
  • You don't want to name it,

The feature impl Trait in return position is exactly what you need here.

Using impl Trait, I would try to rewrite your function signature from

fn filter_none<P>(self) -> Filter<Self, P>
    where
      Self: Iterator<Item = Option<T>> + Sized,
      P: FnMut(&Option<T>) -> bool;

to

fn filter_none(self) -> Filter<Self, impl FnMut(&Option<T>) -> bool>
  where
    Self: Iterator<Item = Option<T>> + Sized;

or even better

fn filter_none(self) -> impl Iterator<Item=Option<T>>
  where
    Self: Iterator<Item = Option<T>> + Sized;

Unfortunately, this is not possible today. This would require the feature "impl Trait in return positions of trait functions," which is still being developed.

With the easy solution of using impl Trait unavailable, we're left with a few options:

  • Name the type of function we're using (in this case a function pointer, fn(&Option<T>) -> bool
  • Avoid naming the type of our iterator at all by Box-ing and dynamically dispatching it
  • Avoid returning a Filter (and hence avoid naming the function type) by creating a new struct that acts as a specialized version of filter and manually implement Iterator for it.

The first option seems very promising. The problem is that it comes with a runtime cost. fn pointers are slower and take up more space than the corresponding Fn implementors.

This is because, under the hood, every function has a zero-size struct associated with it that implements the trait Fn. It does this by creating a method that just returns the answer given by said function.

However, as soon as you require that the function be represented as a pointer (by using the fn(_) -> _ or "function pointer" types), the compiler can no longer take advantage of the zero-size struct.

For the reasons above, if you do want zero runtime-cost, we can rule out option one. Otherwise, it's a serious consideration because it takes less code and has a smaller runtime-cost than the other costly option.

Let's also rule out option two because this one has a runtime-cost as well. It should be clearer this time why that is. As I hinted to in its description, it's because it requires an allocation and dynamic trait dispatch.

That leaves option three: create a custom Iterator struct. Here is how I would write that code:

struct FilterNone<I> {
    inner: I,
}

impl<T, I: Iterator<Item = Option<T>>> Iterator for FilterNone<I> {
    type Item = Option<T>;

    fn next(&mut self) -> Option<Self::Item> {
        loop {
            let item = self.inner.next()?;
            if item.is_some() {
                return item;
            }
        }
    }
}

fn filter_none<T, I: Iterator<Item = Option<T>>> (iter: I) -> FilterNone<I> {
    FilterNone { inner: iter }
}

// Go on to use an extension trait to add a method to all Iterators using filter_none above

Besides answering your question, I do have a few tips based on what I noticed in your code:

  • In the where clause of your iterator adaptor filter_none, you include a + Sized bound. This is strictly unnecessary because requiring Sized is already the default. In fact, you have to opt out of it with + ?Sized if you can accept a DST there.
  • Your standalone function filter_none is the same as Option::is_some
2 Likes

I appreciate greatly both of your posts; it really helped me understand a few more things about rust.

It seems the biggest roadblock I was experiencing was what @timotree3 noted -- your first suggestion is what I tried to no avail, nice to know it might be implemented in the future; and what was suggested by @Michael-F-Bryan through his code, key was this:

type IsSome<T> = fn(&Option<T>) -> bool;

Which changed everything--finally allowing the type checker to allow my defined function. My final code:

use std::iter::Filter;

type IsSome<T> = fn(&Option<T>) -> bool;

trait FilterNone<T> {
  fn filter_none(self) -> Filter<Self, IsSome<T>> where
    Self: Iterator<Item = Option<T>> + Sized {
    self.filter(<Option<T>>::is_some)
  }
}

impl<T, I> FilterNone<T> for I
where I: Iterator<Item = Option<T>> {}

fn main() {
  let vec: Vec<Option<i32>> = vec![None, Some(1), Some(2), None, Some(3)];
  let filtered_vec: Vec<_> = vec
    .into_iter()
    .filter_none()
    .collect();
  println!("{:?}", filtered_vec);
}

The other approaches and information provided by @timotree3 was also very useful; thank you.

1 Like

Glad you feel like your issue is resolved! :slight_smile: Note though that in your final solution the filter_none abstraction is not zero-cost. The function pointer must be stored and indirected through.

1 Like

Thank you for clarifying, i guess i really didn't think about the fact that relying on the filter function which takes a function pointer will ultimately result in that function pointer being stored for the lifetime of the filter structure, hence your approach to constructing a new iterator which wouldn't need to store any function pointer. Clearly, I've spent far too much time with high level languages where functions as arguments are abused.

Parameterizing something with a function is usually zero-cost. As I wrote above in my long post

If you were able to use impl Trait like you wanted to, it would indeed be zero-cost since the compiler can take advantage of the zero-size struct.

Let me try saying what I said above differently:

In Rust, when you pass a function using a generic type, the compiler makes the resulting code as good as if you had made a copy of the thing you're passing a function to, and substituted all the calls to the function argument with concrete calls to the real function.

This is because all generics are monomorphisized. Meaning, that a different version is generated for each combination of type parameters. Since the function you passed has its own unique type, the compiler will generate specific versions of the generic thing that are programmed specifically to call your function -- no runtime overhead.

However, the type that is unique to each function is unnameable. In order to get these zero-cost benefits, you need to allow the compiler to infer the name of the function type.

With all that background, it should now be clear that using impl Trait is zero-cost because it allows the compiler to infer the name of the type. But using a function pointer (syntax fn(_) -> _ in type position) is not zero-cost because the compiler can't monomorphisize and remove the indirection.

In other words, passing around functions is always zero-cost, unless you have to name their type, at which point it requires dynamic dispatch.

1 Like

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.