Parsing PDF: expected type found opaque type `impl Iterator`

Hi all,

I've been looking at PDF parsing code from this awesome post (which made the rounds on HN a couple mounts ago). I really like the author's coding style! Everything looks very clean. I notice a lot more impl Trait in return position that I tend to use.

Unfortunately, when trying to use some of the author's code, I can't seem to do anything with it, in part due to opaque type errors that seem related to -> impl Trait.

For example:

Error:

mismatched types
expected type std::vec::IntoIter<TextObject<'_>>
found opaque type impl Iterator
to return impl Trait, all returned values must be of the same type
for information on impl Trait, see https://doc.rust-lang.org/book/ch10-02-traits.html#returning-types-that-implement-traits
if the trait Iterator<Item=TextObject<'_>> + '_ were object safe, you could return a boxed trait object
for information on trait objects, see https://doc.rust-lang.org/book/ch17-02-trait-objects.html#using-trait-objects-that-allow-for-values-of-different-types
you could instead create a new enum with a variant for each returned type

But shouldn't Vec::<TextObject>::new().into_iter() implement Iterator<Item=TextObject<'_>> + '? Or at least IntoIterator<Item=TextObject<'_>> + '*?

It seems like this ought to work, since the return type of both functions is the same, but it doesn't:

Ultimately, I'm planning to collect the TextObjects into a Vec. Unfortunately, not even a generous smattering of .clone() and .cloned() has gotten this to compile.

fn text_objects(operations: &[Operation]) -> impl Iterator<Item = TextObject<'_>> + '_ {
    TextObjectParser {
        ops: operations.iter()
    }
}

fn parse_tables_on_page(page: &Page) -> impl Iterator<Item = TextObject<'_>> + '_ {
    let content = match &page.contents {
        Some(c) => c,
        None => return Vec::<TextObject>::new().iter(),
    };
    text_objects(&content.operations)
}

* since Vec implements IntoIterator, but this gives me essentially the same error message with .iter() or .into_iter() -- I also tried with Vec::<TextObject>::new().iter().

There seem to be a lot of threads on this topic about async and -> Future<_>; for non-async purposes, here are a few related threads I found:

That's not the problem at all. The problem is that impl Trait in a given context can only ever stand for one concrete type. Therefore, if you return impl Trait from a function, you can't return either of two different concrete types that implement the same trait; you have to return the same type every time.

In your case, the parse_tables_on_page() function first tries to return a Vec::<TextObject>::Iter, and then it tries to return an impl Iterator<Item=TextObject>. Since the impl Trait return type of text_objects() is opaque, it can potentially be a different type from Vec::Iter – and it actually is different here, because it's a TextObjectParser. Therefore the compiler can't just pretend if it were a Vec::Iter. In other words, this assertion of yours is simply not true:

If you want to create an existential type out of multiple, different concrete types, you'll have to use dynamic dispatch, e.g. Box<dyn Iterator<Item=TextObject>>. This sort of construct will simply never work with impl Trait.

2 Likes

For iterators, you can also use either. It's still a form of dynamic dispatch, but avoids the heap allocation of Box (untested):

fn text_objects(operations: &[Operation]) -> impl Iterator<Item = TextObject<'_>> + '_ {
    TextObjectParser {
        ops: operations.iter()
    }
}

fn parse_tables_on_page(page: &Page) -> impl Iterator<Item = TextObject<'_>> + '_ {
    let content = match &page.contents {
        Some(c) => c,
        None => return Either::Right(std::iter::empty::<TextObject>()),
    };
    Either::Left(text_objects(&content.operations))
}
1 Like

Thanks for your time!

I've tried various combinations of <Vec as Iterator<Item = TextObject<'_>> + '_>::new() or <Vec as Iterator<Item = TextObject<'_>> + '_>::new().iter() -- shouldn't that make Vec::new() into an impl Iterator<Item = TextObject<'_>> + '_>?

and it actually is different here, because it's a TextObjectParser

But TextObjectParser is Iterator, right? Due to impl<'src> Iterator for TextObjectParser<'src>?

I guess I'm still stuck wrapping my head around the transitivity here...

fn first() -> impl Bar ...
fn second() -> impl Bar {
   first()
}

I (think I) understand several other threads here explaining how one needs to return Box<dyn Trait> when the return types are different types that implement Trait, but in this case one is literally returning the other. How can the return types possibly be different, when the second is returning the first?

Further, why doesn't changing the return type to its concrete type help?

fn parse_tables_on_page(page: &Page) -> TextObjectParser {
    let content = match &page.contents {
        Some(c) => c,
        None => return TextObjectParser{ops: Vec::new().iter()},
    };
    text_objects(&content.operations)
}

Same error:

mismatched types
expected struct TextObjectParser<'_>
found opaque type impl Iterator
to return impl Trait, all returned values must be of the same type
for information on impl Trait, see https://doc.rust-lang.org/book/ch10-02-traits.html#returning-types-that-implement-traits
you could instead create a new enum with a variant for each returned type

One important use of impl Trait is forward compatibility. By returning an opaque type from text_objects(), you're telling the compiler that you might change the return type in the future to something else that also implements Iterator<...>.

The impl Iterator returned from text_objects() is hiding the actual return type from parse_tables_on_page(): The compiler requires that you write code that will work for any type that implements Iterator<Item = TextObject>, because the implementation of text_objects may change to return a different type in the future.

3 Likes

One of the main purposes of opaque types, like impl Trait in return position, is to hide the concrete type so that you can only rely on the trait. This in turn gives the function author the ability to change the concrete type without that being a breaking change. Therefore, no two distinct opaque types can be considered equal -- it would defeat the flexibility. And more generally, the function signature defines the contract, not the body of the function. Exploiting the fact that second()'s returned value came from first() is not allowed, type-wise.

1 Like

Yes, but that doesn't matter.

The above example that you made up doesn't demonstrate the problem. The problem is not that the return value of first() and second() are different. The following piece of code would demonstrate the problem:

fn first_a() -> impl Bar { … }
fn first_b() -> impl Bar { … }

fn second() -> impl Bar {
    if true {
        first_a()
    } else {
        first_b()
    }
}

Here, the problem is that the return types of first_a() and first_b() could be different. Although both are annotated to return impl Bar, the underlying concrete types are different. The key thing to understand here is that impl Trait is not a real type unto itself. It is a placeholder for a specific type, but only the compiler knows what concrete type is behind it.

The whole point of impl Trait is to hide the concrete type behind the interface of the function. But this means that you can't treat different instances of impl Trait as a single type, even for the same Trait, because you don't know what concrete types are behind them – they might be anything. Therefore, two mentions of the same impl Trait must be treated as different by the compiler.

For the same reason. parse_tables_on_page is declared to return a TextObjectParser, but text_objects() is still declared to return an impl Iterator. Since an impl Iterator is not necessarily backed by a TextObjectParser, you aren't allowed to "downcast" it to one. If this were allowed, impl Trait would be no different from declaring the return value to be of a concrete type, and its existence would be pointless.

1 Like

Thanks to everyone for the quick and thorough replies!

I'll obviously need to do more thinking about when and how to use -> impl Trait in my own code.

For the moment, I was able to seemingly make things better by returning text_objects from each branch instead of trying to figure out whether I could find a way to use something like std::iter::empty() or <Vec<TextObject>>::new().iter().

fn parse_tables_on_page(page: &Page) -> impl IntoIterator<Item = TextObject> {
    let content = match &page.contents {
        Some(c) => c,
        None => return text_objects(&[]),
    };
    text_objects(&content.operations)
}

Next up: my old friend cannot return value referencing local variable :laughing:

Thanks to all for the help!

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.