Type of Functions returning an Iterator


#1

I would like to write a function from Iterator<Item=A> to Iterator<Item=B>. For example:

fn test_iterator<I,O>(input: I) -> O
    where I: Iterator<Item=i32>,
          O: Iterator<Item=String> {
    input.map(|x| x.to_string());
}

I get the compiler error:

expected type parameter, found struct std::iter::Map

Next try was to give it this more specific type:

fn test_iterator<I,F>(input: I) -> Map<I,F>
    where I: Iterator<Item=i32>,
          F: FnMut(i32) -> String {
    input.map(|x| x.to_string())
}

This gives me the error:

expected type parameter, found closure
= note: expected type F
found type [closure@parser.rs:14:15: 14:32]

Is there a possibility to annotate the return type of the function?

I don’t want to go to something like Vec, because I would like to have composable functions on Iterators, e.i. I don’t want:

fn test_iterator<I>(input: I) -> Vec<String>
    where I: Iterator<Item=i32> {
    input.map(|x| x.to_string()).collect()
}

I think it boils down to the same question how to explicitly annotate the type of r1 and r2 in the following example:

fn test_iterator2<I>(input: I) -> Vec<String>
    where I: Iterator<Item=i32> {
    let r1 = input.map(|x| x.to_string());
    let r2 = r1.map(|s| s.to_uppercase());
    r2.collect()
}

Am I missing something or is this kind of generic programming not possible in Rust?

Regards, Kathi


#2

Short answer: you cannot do this yet, but an upcoming Rust feature, impl Trait, will soon come to the rescue and save the day. Ugly workarounds exist in the meantime.

Long answer: when you write a function signature like this…

fn test_iterator<I,O>(input: I) -> O
    where I: Iterator<Item=i32>,
          O: Iterator<Item=String> {
    /* ... */
}

…what you are saying is “I accept an iterator type I of the caller’s choosing as input, and I will return an iterator type O of the caller’s choosing as output”.

The first part of this sentence is something that you want. The second part of the sentence, and especially the highlighted part, most likely wasn’t what you intended.

What you want instead is to state “I accept an iterator type I of the caller’s choosing as input, and I will return an iterator type O of my choosing as output”.

There is an unstable Rust feature for that, called impl Trait. It lets you specify in your function signature “I can return anything which implements a certain trait”, which will change the world by enabling use cases like the one you have in mind. But it’s not available in the stable compiler yet because it needs to be ironed out a little more.

As a mitigation, you can write a struct which implements the Iterator<Item=String> trait yourself, and return that. Here’s a rough sketch of this workaround in action:

struct ToStringIterator<I : Iterator<Item=i32>> {
    source: I
}

impl Iterator for ToStringIterator {
    type Item = String;

    fn next(&mut self) -> Option<Self::Item> {
        return self.source.next().map(|i| i.to_string())
    }
}

fn test_iterator<I>(input: I) -> ToStringIterator<I>
    where I: Iterator<Item=i32> {
    /* ... */
}

#3

Thanks for the explanation!

Yes, that’s true. That’s why I tried to give Map<I,F> as a result type. But I see that your argument in this case applies to the FnMut(i32), isn’t it?
Do I understand it right that it is also not possible to explicitly annotate the type of r in

fn test_iterator2<I>(input: I) -> Vec<String>
    where I: Iterator<Item=i32> {
    let r = input.map(|x| x.to_string());
    r.collect()
}

And the reason is that I would have to annotate the type of the anonymous struct which the compiler creates to implement the closure?


#4

That’s right. Closures have an anonymous type, so you cannot explicitly write down any type annotation which involves a closure type.


#5

But why is the signature of map inside the Iterator trait correct?

fn map<B, F>(self, f: F) -> Map<Self, F>
    where
        F: FnMut(Self::Item) -> B,
    { ... }

Is it because F appears both in input and output position? And does it use the same trick you showed me above under the hoods? E.i, create a new struct (Map) and make it implement the Iterator trait?


#6

When the compiler sees a call to Iterator::map, with a callable as an argument, it knows what the type of the callable is, even if it is an anonymous closure type that you cannot name. So it can substitute “F” with that type in the implementation of the map() function and in the Map struct.

The Map implementation, on its side, will look similar to this (modulo usage of some unstable variant of the FnMut trait to avoid naming the result type B explicitly):

struct Map<I: Iterator,  // Initial iterator type
           B,  // Output of the final iterator
           F: FnMut(I::Item) -> B  // Mapping function
> {
    source: I,
    mapping: F,
}

impl<I, B, F> Iterator for Map<I, B, F> {
    type Item = B;

    fn next(&mut self) -> Option<Self::Item> {
        self.source.next().map(self.mapping)
    }
}

In this situation, the compiler can safely store the closure type F inside of the struct Map, even if you cannot name it, because it inferred that type from the argument that you passed to the Iterator::map method.

EDIT: Just to clarify, the main difference between Iterator::map and you example is that in Iterator::map, the caller decides what is the type of F, whereas in your example it is an implementation detail of the function which the caller should not need to know about.


#7

Well untill impl trait you can always box the iterator


fn test_iterator<'a, I, T>(input: I) -> Box<Iterator<Item = String> + 'a>
where
    I: Iterator<Item = T> + 'a,
    T: Display,
{
    Box::new(input.map(|x| x.to_string()))
}

#8

You are right!

Come to think of it, this tokio page on workarounds for lack of impl Trait is also relevant when working with iterators: https://tokio.rs/docs/going-deeper-futures/returning/ .


#9

This clarifies a lot! And after reading it I figured out I can annotate the type of the variable in my example above as:

fn test_iterator2<I>(input: I) -> Vec<String>
    where I: Iterator<Item=i32> {
    let r: Map<I,fn(i32) -> String> = input.map(|x| x.to_string());
    r.collect()
}

and hence I can also write

fn test_iterator<I,F>(input: I) -> Map<I,fn(i32) -> String>
    where I: Iterator<Item=i32> {
    input.map(|x| x.to_string())
}

But it seems to work only, because it’s no “real” closure, e.i. there is no environment to capture and hence it can be seen as a simple function pointer fn(i32) -> String. Is there some automatic coersion going on? Since I cannot write

fn test_iterator<I,F>(input: I) -> Map<I,fn(i32) -> String>
    where I: Iterator<Item=i32> {
    let y = 5;
    input.map(|x| (x+y).to_string())
}

The compiler then complains about expecting a fn pointer and finding a closure instead.

One question regarding the use of Trait Objects: is there a conceptual difference between Trait Objects using a Box<Foo> and Trait Objects using &Foo other than

  • &Foo is always borrowed whereas
  • Box<Foo> is a “pointer” to an owned Foo and Box<&Foo> behaves more like &Foo?

In the Rust Book they call both concepts Trait Objects so I was wondering if they are desugared to the same low level code.


#10

Yes - Rust 1.19 landed https://github.com/rust-lang/rust/pull/42162.


#11

The conceptual difference is the ownership difference. Box<Foo> internally has a fat pointer where the data portion lives on the heap. &Foo is also a fat pointer but the data might be on the stack. No real conceptual or representation difference - just the two data pointers are pointing at different parts of the address space (but stack is more likely to be in cache).

Box<&Foo> is a double indirection - it’s a pointer to a fat pointer; &Foo is just the fat pointer itself. But behavior wise it’ll be the same.


#12

I think the Rust type system is sometimes hard to grasp, because type theoretic reasoning is often mixed with implementation specific reasoning. I’m still trying to get the things clear in my head :wink:

What I figured:

fn test_iterator<I: Iterator<Item=i32>,O: Iterator<Item=String>>(input: I) -> O {
     input.map(|x| x.to_string())
}

This example does not work. The type would be (in some type “pseudocode”)

  • forall I:It<i32>,O:It<String>: I -> O

and as @HadrienG pointed out this is not what we want, because we choose the return type and not the caller.

Hence we rather want this:

fn test_iterator_impl<I: Iterator<Item=i32>>(input: I) -> impl Iterator<Item=String>{
    input.map(|x| x.to_string())
}

giving us a type

  • forall I:It<i32>: (I -> exists O: O:It<String>)

(Just curious: doesn’t the return type of a function need to have a size known at compile time? If yes, how does impl Trait work? There is no Box or other dynamic dispach, isn’t it?)

This raises the question if we can move the forall quantifier further in, e.i. giving us the type

  • (forall I:It<i32>) -> (exists O:It<i32>)

But a naive try fails:

fn test_iterator_impl2(input: Iterator<Item=i32>) -> impl Iterator<Item=String> {
    input.map(|x| x.to_string())
}

Here the compiler complains about

  • the size of the argument not beeing known at compile time (which seems to be a restriction due to the implementation)
  • not beeing able to call map on a trait object (which means that somethow there are also trait objects Foo and not only &Foo? Or is there some coersion going on?)

So apparently we need an indirection, because of how Rust is implemented (but am I right to say, that this is no type theoretic limitation?):

fn test_iterator_mixed(input: Box<Iterator<Item=i32>>) -> impl Iterator<Item=String> {
    input.map(|x| x.to_string())
}

This does not compile and gives the error

  • only named lifetimes are allowed in impl Trait, but ` ` was found in the type `std::iter::Map<std::boxed::Box<std::iter::Iterator<Item=i32>>, [closure@main.rs:62:15: 62:32]>`

Does anyone know what this means?

Ok, so my last try ist then:

fn test_iterator_boxed(input: Box<Iterator<Item=i32>>) -> Box<Iterator<Item=String>> {
    Box::new(input.map(|x| x.to_string()))
}

which should have a type like

  • (forall I:It<i32>) -> (exists O:It<i32>)

because we can put anything inside the Box and get something inside the result Box.
The fact that we have to mention the Box explicitely seems to be Rust specific. In other languages where Objects are always accessed by reference (e.g. Java), we could write something like

fn test_iterator_plain(input: Iterator<Item=i32>) -> Iterator<Item=String> {
    (input.map(|x| x.to_string())
}

or do I get something wrong here?


#13

When the compiler sees an “impl Trait” as a function’s return type, it is allowed to go have a look at the function’s implementation, see what concrete type will be returned, and substitute that.

The reason this doesn’t break the abstraction is that the caller is only allowed to leverage the interface characteristics of the output type which are exposed by the chosen trait.

Your second example does not work because a trait is not a concrete type, which is what a function call requires. It could be made to work by using impl Trait in argument position, which is a feature that has been accepted, but AFAIK not been implemented yet:

fn test_iterator_impl3(input: impl Iterator<Item=i32>) -> impl Iterator<Item=String> {
    input.map(|x| x.to_string())
}

impl Trait in function argument position is effectively just syntactic sugar for the generic signature that you initially used. But it allows the input and output types of a given function to be expressed using a single consistent syntax.

As soon as you start using a Box, you stop using static compile-time dispatch, and enter the realm of dynamic memory allocation and dynamic dispatch, which has different rules and performance characteristics. The fact that the syntax for static and dynamic dispatch are so close is a common criticism of Rust, and there is a proposal to disambiguate it more by introducing a “dyn” keyword, like so:

  • Box<dyn Trait>
  • &dyn Trait

The error that you get in test_iterator_mixed is linked to the fact that impl Trait does not play very nicely with lifetimes yet, which is one reason why this feature has not yet been released on the stable channel. I think you could fix it by adding a + 'static' at the end of your return type (which is the lifetime of the contents of your Box), but don’t quote me on this as I don’t use nightly and cannot check.

Rust makes you use a Box explicitly so that it is clear, inside of your code, what is stack-allocated and what is heap-allocated. This is important as heap allocation requires runtime support and has nontrivial costs. Other languages like Java do not care about exposing this cost, but for Rust’s target audience of low-level, performance-focused programmers, exposing it is the best tradeoff.


#14

Then the error message is kind of misleading saying something about not beeing able to call map on trait objects… But ok, this reasoning is straight forward otherwise.

I’m not sure if it is exactly the same. What would the compiler do if it sees an impl Trait in argument position? I suppose that it generates a specialized function for each concrete call as to know the size of the argument at compile time. Or is there some magic going on?
If it’s the case, then impl Trait in argument position would only be syntactic sugar for a generic <I: Trait> and an argument of type I?
For an impl Trait in output position however there is no quantification - and only one function with an (internal) concrete output type.


#15

Yes, that’s a product of the fact that the Rust syntax for dynamic dispatch (aka “trait objects”) is to use just the name of the trait in a context where a type is expected, such as in &Trait. Again dyn trait will hopefully fix this someday, if people can agree on that breaking change. In the meantime, the compiler warning could be improved in isolation.

This is exactly my understanding of it.

Exactly, impl Trait in argument and result position are not implemented in the same way, which is why impl Trait in argument position was not part of the original proposal. From my understanding, it’s just that people thought it would have nicer ergonomics to be able to stick with a single syntax for “I want something that implements a trait” when they want to.


#16

I checked - it compiles :slight_smile:


#17

test_iterator_impl is not a function. It is a template. You can’t make a function pointer to it; only to particular functions it creates. Either specified with super-fish or inferred by setting the type.
The compiler works out the the return structure which has a fixed size. impl Trait just says you shouldn’t care about what the specific structure type is only that it will be limited to that offered by the Trait.


#18

We don’t use “template” in Rust since Rust’s generics don’t work like C++'s templates do. In a generic sense, this is a good way to think about it, just don’t draw too many connections to how it works in C++.


#19

Is this due to the generic arguemnt <I: Iterator<Iterm=i32>> or because of the impl Iterator<Item=String>? If I understood it right the compiler figures out the concrete return type once, so there won’t be different functions differing in the return type. Put otherwise: would a “function” with the signature

fn something() -> impl Iterator<Item=String> { ... }

be a “proper” function?

How is it called the Rust way?

I was wondering if there is some formal specification of the Rust type system? I couldn’t find any yet.


#20

Answering this first. The field is of generics. The document uses generic function. The important point is that it isn’t a machine code function rather a building block for the compiler to make functions from when needed.
The doc also calls the I as being “Generic type parameter name”, to me it is the generic.
Right of the colon is the bounds. (Unsure if calling it a constraint is the rust way.)

Entirely down to the generic and not the impl Trait.
What the complier does would be better answered by someone else but my guess; it resolves all identifiers confirming all calls meet whatever bounds have been set and types get partially resolved to generic structures. Code that calls the function then fixes in place the concrete types and (either reuses the last time that the same types were used or if not used before;) creates a function from the partially resolved one.
No idea how this all interacts with intermediate representations that the compiler uses.
(The important thing from C++ point is there is no nightmare 2 phase lookup and everything is constrained so avoids many errors.)

something is. I think I would just go an describe it a plain function.

Bit extra;

fn something_2() -> impl Iterator<Item=String> { ... }

impl Trait creates anonymous unique types so the return of something() and something_2() are different types.

Some of the comments are mentioning impl Trait parameter types, AFIAK seem to be slightly different, aren’t yet implemented in nightly and seem (from the little I have read) to be roughly a third way to specify bounds. I think intent is to simply (which return impl Trait does.) but to me even with having both : and where makes the language more confusing.