Beginner confused with mismatched opaque types

Hi! I'm a beginner looking for some help. I tried to provide context, but if you just want to read the core problem, skip to the last section for a minimal example.

Any insight would be greatly appreciated. Thanks a lot!

Some context

I'm going through the Rust book, and I just learned about Iterators on chapter 13. Then, on section 13.3, there is a suggested modification to the project in chapter 12. This is suggested, but no code is given for it. Specifically, I mean this paragraph:

For a further improvement, return an iterator from the search function by removing the call to collect and changing the return type to impl Iterator<Item = &'a str> so that the function becomes an iterator adapter. Note that you’ll also need to update the tests! Search through a large file using your minigrep tool before and after making this change to observe the difference in behavior. Before this change, the program won’t print any results until it has collected all of the results, but after the change, the results will be printed as each matching line is found because the for loop in the run function is able to take advantage of the laziness of the iterator.

I tried to implement this as an exercise, and found the following issue.

The problem

The project has two similar functions, search and search_case_insensitive. Which one is called depends on a boolean flag, so there is an if-else statement to handle this.

However, with the modifications above, both functions return the iterator resulting from the filter method. The book suggest to treat this as an opaque type implementing Iterator<Item = &str>, by writing the return in the signature as impl Iterator<Item = &'a str>.

Now, the compiler tells me that because these are opaque types, even if I write exactly the same in the signature of both functions, it treats them as different types. I think I understand why this is, but I don't know how to solve it.

I tried to find solutions online, and found two options:

  1. Use something called trait objects, which I haven't learned about yet because they come much later in the book.
  2. Change the signature to the explicit type on the return.

I'm hesitant about solution 1. The fact that the book proposes this change to the code suggests to me that there should be a solution that doesn't use stuff I haven't learned about yet. Unless it's an oversight?

As for solution 2, I don't know how to figure out the explicit type.

Minimal example


fn one(text: &str) -> impl Iterator<Item = &str> {
    text
        .lines()
        .filter(|line| line.contains("1"))
}
fn two(text: &str) -> impl Iterator<Item = &str> {
    text
        .lines()
        .filter(|line| line.contains("2"))
}

fn main() {
    let choice = true;
    let text = "\
this line contains 1
this line contains 2
this line contains neither";
    let result = if choice {
        one(text)
    } else {
        two(text)
    };
}

The compiler complains in the main function because of a mistmatch in the if block.

1 Like

I believe it is an oversight in the book!

For solution 2, the explicit type you would want is

type Iter<'a> = std::iter::Filter<std::str::Lines<'a>, fn(&&str) -> bool>;

Note that this only works if the closure passed to filter doesn't have any local state, as fn(&&str) -> bool is a raw function pointer. (Closures have unnameable types, but those without state can be coerced to function pointers.)

You can determine the needed type by looking at the return types of the functions of str::lines and Iterator::filter on doc.rust-lang.org/std/.

For solution 1 the way you would use trait objects here is like so.

fn main() {
    let choice = false;
    let text = "\
this line contains 1
this line contains 2
this line contains neither";

    // These must live in the outer scope for lifetime reasons, but
    // should be initialized inside the `if` expression.
    //
    // This syntax for delayed initialization allows that, and the compiler
    // ensures that the variables actually are initialized before reading
    // them.
    //
    // Note that the `mut` here is because we need `result` to be a mutable
    // reference to use the iterator, not because delayed initialization itself
    // requires it.
    let mut result1;
    let mut result2;

    let result: &mut dyn Iterator<Item = &str> = if choice {
        result1 = one(text);
        &mut result1
    } else {
        result2 = two(text);
        &mut result2
    };
}

The two mutable variables result1 and result2 have the opaque types returned by one and two, while result is a (reference to) a trait object, pointing to one or the other.

Since trait objects are not Sized, they must live behind a reference or pointer. We could have alternatively used Box<dyn Iterator<...>> instead of &mut dyn Interator<...> but this way avoids an extra allocation.

I believe the book should explain all of this when you get to the chapter on trait objects.

1 Like

Also, keep in mind that using dyn comes with runtime overhead, and putting that overhead in a hot loop is a bad idea. Combining iterators together with an enum is a more performant approach.

2 Likes

Thanks a lot! The explicit type you mention works for the minimal example. But unfortunately, it fails for the case in the book, because the closure captures an environment variable. So it would be something like:

fn one<'a>(query: &str, text: &'a str) -> ??? {
    text
        .lines()
        .filter(|line| line.contains(query))
}
fn two<'a>(query: &str, text: &'a str) -> ??? {
    text
        .lines()
        .filter(|line| line.contains(query))
}

Can this approach still work with any modifications, or should I accept that I need to use trait objects?

Combining iterators together with an enum is a more performant approach.

Can I do that without knowing the explicit types?

Sure, with generics. Rust Playground

pub enum EitherIter<L, R> {
    Left(L),
    Right(R),
}

impl<L, R, Item> Iterator for EitherIter<L, R>
where
    L: Iterator<Item = Item>,
    R: Iterator<Item = Item>,
{
    type Item = Item;

    fn next(&mut self) -> Option<Self::Item> {
        match self {
            Self::Left(iter) => iter.next(),
            Self::Right(iter) => iter.next(),
        }
    }
}
2 Likes

Sadly the explicit type solution breaks if there are any captures, because the types of closures the capture anything are unnameable.

And even if you could name the type, the two closures in one and two are actually of different types despite being defined identically.

3 Likes

This reply continues along the lines of "how can I get a nameable iterator without dyn", which may be a little off-topic.

The "replace closures with fn(..)" approach can still work if you jump through enough hoops, but it's generally not worth it.

Expand this section to see how it can be done and some discussion of why it's not worth it. I don't recommend it, but it may be instructive on some level.

For example, you need to replace your filter closures with something that doesn't capture query. That could be something that takes query by argument instead.

        // I combined both use cases here, but this approach will work when
        // you use separate functions (the awful return type we get would be
        // the same for both).
        let f: fn(&str, &str) -> bool = if case_sensitive {
            |query, line| line.contains(query)
        } else {
            |query, line| line.to_lowercase().contains(query)
        };

But to be able to pass this to .filter, we somehow need to have both &strs available in the item. Let's say we find a way to get Item = (&str, &str). Then filter takes &Item, so we need to tweak our function pointer a bit:

        let f: fn(&(&str, &str)) -> bool = if case_sensitive {
            |&(query, line)| line.contains(query)
        } else {
            |&(query, line)| line.to_lowercase().contains(query)
        };

Now we need to figure out how to create an iterator over (&str, &str) with our query and the lines. lines() gives us Item = &str. We can use repeat to get another iterator which is just copies of query, and then zip to combine our two &str iterators into an iterator over (&str, &str).

        let iter = zip(repeat(query), content.lines());

That's an iterator we can use .filter(f) on. But we want to return an iterator over our content &strs, so we have to add one more adapter at the end:

        iter.filter(f).map(|tuple| tuple.1)

And the compiler kindly tells us most of the return type we end up with:

   = note: expected unit type `()`
                 found struct `Map<Filter<Zip<std::iter::Repeat<&str>, std::str::Lines<'_>>, for<'a, 'b, 'c> fn(&'a (&'b str, &'c str)) -> bool>, {closure@src/main.rs:36:28: 36:35}>`

:persevering_face:

Well, let's push on anyway. There's a closure for our map in there which should look something like a fn(&str, &str) -> &str. I haven't talked about lifetimes at all, but we're going to have to annotate them. I'm practiced at that, and you're probably not... but I think I'll just give the answer or this reply will get even longer.

When the content we take in the signature is a &'c str, we need the map function pointer to look something like

fn(&str, &'c str) -> &'c str

and the entire signature may look something like this.

    fn combined_search<'q, 'c>(
        query: &'q str,
        content: &'c str,
        case_sensitive: bool,
    ) -> Map<
        Filter<
            Zip<Repeat<&'q str>, std::str::Lines<'c>>, 
            fn(&(&str, &str)) -> bool,
        >,
        fn((&str, &'c str)) -> &'c str,
    > {

You can clean this up some with a type alias. But the convoluted type is one reason this isn't worth it. (Here's the code after doing that.)

What's another reason? We jumped through a lot of hoops so we could use fn(..) instead of Box<dyn Fn(..)> or Box<dyn Iterator<..>>. We did avoid needing to allocate. But our fn(..) pointers are still a form of dynamic dispatch, so we're probably still paying some cost. Also we now have two closures function dispatches instead of one, and we're jumping through more iterator adapters like Repeat and Zip too. We did unify the types and are able to name the type, but I don't know that I consider it a net win over dyn.

Another reason on top of that is that this is fragile -- you can almost determine what the body of the function looks like from the return type now. (If we change our adapters in the function body, it would change our return type.) But that can be mitigated by wrapping up the Map<..> in a new type instead of using a type alias. Then you would implement Iterator for the new type. If you wanted to use any other iterator traits, you'd have to implement them too.

Which brings us to a better alternative to all of that: just implement your own iterator from the start, moving your closure logic into the implementation of the next method.


When I had a need to name some chained iterator in my early Rust days, I actually used types like that. But once I decided it wasn't worth it, I moved to just implementing my own iterators instead. If you don't want to be generic over some internal closure (F: FnMut(..)), you move the closures/closure logic into the next function itself.

    pub struct GrepIter<'query, 'content> {
        query: &'query str,
        lines: Lines<'content>,
        case_sensitive: bool,
    }

    impl<'content> Iterator for GrepIter<'_, 'content> {
        type Item = &'content str;
        fn next(&mut self) -> Option<Self::Item> {
            let lines = &mut self.lines;
            if self.case_sensitive {
                lines.filter(|line| line.contains(self.query)).next()
            } else {
                lines
                    .filter(|line| line.to_lowercase().contains(self.query))
                    .next()
            }
        }
    }

Playground.


To wrap up things more on topic to the OP, we will eventually get a way to name impl Trait types so that we can return the same impl Trait from multiple functions. Note that they have to actually be the same type -- so they can't differ by having distinct closures, like @krsnik02 pointed out.

But the above approaches needed the same actual type too, so we just need to have something like a combined_search[1] function that handles both case sensitive and case insensitive searching.

#![feature(type_alias_impl_trait)]
pub mod minigrep {
    pub type GrepIter<'q, 'c> = impl Iterator<Item = &'c str>;

    #[define_opaque(GrepIter)]
    fn combined_search<'q, 'c>(
        query: &'q str,
        content: &'c str,
        case_sensitive: bool,
    ) -> GrepIter<'q, 'c> {
        content.lines().filter(move |line| {
            if case_sensitive {
                line.contains(query)
            } else {
                line.to_lowercase().contains(query)
            }
        })
    }

Then our two public functions can return GrepIter as well.

That's probably how I would have replied to the OP if the feature was stable.


  1. or GrepIter::new ↩︎

3 Likes