I've been looking at PDF parsing code from this awesome post (which made the rounds on HN a couple mounts ago). I really like the author's coding style! Everything looks very clean. I notice a lot more impl Trait in return position that I tend to use.
Unfortunately, when trying to use some of the author's code, I can't seem to do anything with it, in part due to opaque type errors that seem related to -> impl Trait.
But shouldn't Vec::<TextObject>::new().into_iter() implement Iterator<Item=TextObject<'_>> + '? Or at least IntoIterator<Item=TextObject<'_>> + '*?
It seems like this ought to work, since the return type of both functions is the same, but it doesn't:
Ultimately, I'm planning to collect the TextObjects into a Vec. Unfortunately, not even a generous smattering of .clone() and .cloned() has gotten this to compile.
* since Vec implements IntoIterator, but this gives me essentially the same error message with .iter() or .into_iter() -- I also tried with Vec::<TextObject>::new().iter().
There seem to be a lot of threads on this topic about async and -> Future<_>; for non-async purposes, here are a few related threads I found:
That's not the problem at all. The problem is that impl Trait in a given context can only ever stand for one concrete type. Therefore, if you return impl Trait from a function, you can't return either of two different concrete types that implement the same trait; you have to return the same type every time.
In your case, the parse_tables_on_page() function first tries to return a Vec::<TextObject>::Iter, and then it tries to return an impl Iterator<Item=TextObject>. Since the impl Trait return type of text_objects() is opaque, it can potentially be a different type from Vec::Iter – and it actually is different here, because it's a TextObjectParser. Therefore the compiler can't just pretend if it were a Vec::Iter. In other words, this assertion of yours is simply not true:
If you want to create an existential type out of multiple, different concrete types, you'll have to use dynamic dispatch, e.g. Box<dyn Iterator<Item=TextObject>>. This sort of construct will simply never work with impl Trait.
I've tried various combinations of <Vec as Iterator<Item = TextObject<'_>> + '_>::new() or <Vec as Iterator<Item = TextObject<'_>> + '_>::new().iter() -- shouldn't that make Vec::new() into an impl Iterator<Item = TextObject<'_>> + '_>?
and it actually is different here, because it's a TextObjectParser
But TextObjectParser is Iterator, right? Due to impl<'src> Iterator for TextObjectParser<'src>?
I guess I'm still stuck wrapping my head around the transitivity here...
fn first() -> impl Bar ...
fn second() -> impl Bar {
first()
}
I (think I) understand several other threads here explaining how one needs to return Box<dyn Trait> when the return types are different types that implement Trait, but in this case one is literally returning the other. How can the return types possibly be different, when the second is returning the first?
Further, why doesn't changing the return type to its concrete type help?
One important use of impl Trait is forward compatibility. By returning an opaque type from text_objects(), you're telling the compiler that you might change the return type in the future to something else that also implements Iterator<...>.
The impl Iterator returned from text_objects() is hiding the actual return type from parse_tables_on_page(): The compiler requires that you write code that will work for any type that implements Iterator<Item = TextObject>, because the implementation of text_objects may change to return a different type in the future.
One of the main purposes of opaque types, like impl Trait in return position, is to hide the concrete type so that you can only rely on the trait. This in turn gives the function author the ability to change the concrete type without that being a breaking change. Therefore, no two distinct opaque types can be considered equal -- it would defeat the flexibility. And more generally, the function signature defines the contract, not the body of the function. Exploiting the fact that second()'s returned value came from first() is not allowed, type-wise.
The above example that you made up doesn't demonstrate the problem. The problem is not that the return value of first() and second() are different. The following piece of code would demonstrate the problem:
fn first_a() -> impl Bar { … }
fn first_b() -> impl Bar { … }
fn second() -> impl Bar {
if true {
first_a()
} else {
first_b()
}
}
Here, the problem is that the return types of first_a() and first_b() could be different. Although both are annotated to return impl Bar, the underlying concrete types are different. The key thing to understand here is that impl Trait is not a real type unto itself. It is a placeholder for a specific type, but only the compiler knows what concrete type is behind it.
The whole point of impl Trait is to hide the concrete type behind the interface of the function. But this means that you can't treat different instances of impl Trait as a single type, even for the same Trait, because you don't know what concrete types are behind them – they might be anything. Therefore, two mentions of the same impl Traitmust be treated as different by the compiler.
For the same reason. parse_tables_on_page is declared to return a TextObjectParser, but text_objects() is still declared to return an impl Iterator. Since an impl Iterator is not necessarily backed by a TextObjectParser, you aren't allowed to "downcast" it to one. If this were allowed, impl Trait would be no different from declaring the return value to be of a concrete type, and its existence would be pointless.
Thanks to everyone for the quick and thorough replies!
I'll obviously need to do more thinking about when and how to use -> impl Trait in my own code.
For the moment, I was able to seemingly make things better by returning text_objects from each branch instead of trying to figure out whether I could find a way to use something like std::iter::empty() or <Vec<TextObject>>::new().iter().