How does "Iterator::collect" make transformation?

imic · January 15, 2023, 2:45pm

Say I have code

fn main() {
    use std::iter::Filter;
    use std::vec::IntoIter;
    let it: IntoIter<i32> = vec![1, 2, 3, 4, 5].into_iter();
    let it2: Filter<IntoIter<i32>, _> = it.filter(|e| e % 2 == 0);
    let coll: Vec<i32> = it2.collect();

    println!("{:?}", coll);
}

If the program above is correct, the t2.collect()(Filter in std::iter - Rust) will trigger a function call of FromIterator::from_iter() , in turns a function call of Vec<T>::from_iter().

It is well known that filter is lazy, and the transformation will take in effect when collect or next is being called. But, inside the standard library, haven't see the closure |e| e % 2 == 0 (namely Filter's predicate field) is being called. Is my inference correct? If so, when is it called?

Thanks in advance!

steffahn · January 15, 2023, 3:16pm

The standard library is so full of optimizations that it’s sometimes hard to spot the relevant parts.

This applies especially to the Vec::from_iter implementation. Essentially it boils down to it2.collect() being implemented (mostly) by one call to it2.for_each(…). Then, for_each is a story of its own… there is a default implementation in terms of .next() for every iterator that doesn’t override it, but Filter does because that results in slightly more optimal code.

Either way, if you interpret it2.collect() to be essentially implemented either as

let mut coll = vec![];
for item in it2 {
    coll.push(item);
}

or

let mut coll = vec![];
it2.for_each(|item| coll.push(item);

and it2 is of type Filter, we are close to seeing the place where the closure |e| e % 2 == 0 closure is called. The for loop uses .next() calls in a loop, the for_each is its own method, either way we can look at the implementation of Iterator for Filter and try to spot the call to the closure.

The Filter type by the way simply bundles up the inner iterator and the predicate closure

pub struct Filter<I, P> {
    iter: I,
    predicate: P,
}

And the next method looks like

    fn next(&mut self) -> Option<I::Item> {
        self.iter.find(&mut self.predicate)
    }

Well… that uses find now, doesn’t it? But if it didn’t: this code is mostly equivalent to something like

    fn next(&mut self) -> Option<I::Item> {
        while let Some(item) = self.iter.next() {
            if (self.predicate)(&item) {
                return item; // stops the loop, returns from the function
            } // otherwise, continue with the next
        }
        // in case that no more matching items are found:
        None
    }

For the actual find implementation: impl Iterator for Filter does not define find itself, so the default is used. It calls back to a call to a method called try_fold… I mean, if you want, go study it’s signature, I’m trying to go too deep into detail here, which is why many code examples will merely paraphrase the gist/implementation idea of stuff, not literally what the standard library uses. try_fold is a lot like a for_each method, but additionally with the possibility of passing along owned state, and also with the possibility of “failure”/early-termination (which is crucial for implementing find). Fun-fact: try_fold is so general that you could even implement .next() in terms of try_fold, too. Now, as you can see in the linked implementation, Filter also implements try_fold itself as well as fold, but unlike what I claimed above, it actually does not directly override for_each, since its so similar to fold. So for_each has a default implementation in terms of fold that is used.

To give a simplified view of how a for_each implementation of Filter could look like and what its benefits are: This is a possible for_each implementation, quite similar to the actual try_fold implementation:

    fn for_each<F>(self, f: F)
    where
        Self: Sized,
        F: FnMut(Self::Item),
    {
        self.iter.for_each(|item| if self.predicate(&item) {
            f(item)
        });
    }

Compare this to using a default-implementation for for_each which would use next in a loop

    fn for_each<F>(self, f: F)
    where
        Self: Sized,
        F: FnMut(Self::Item),
    {
        while let Some(item) = self.next() {
            f(item);
        }
    }

Now, with the .next() implementation above, which itself is a loop, too, using this default for_each would result in a loop inside a loop. The manually implemented implementation on the other hand is simply an if statement inside a loop, the latter is slightly more straightforward, and could be thus more performant. More relevant even: The if-statement-in-a-loop approach only loops a single time through the inner iterator without any breaks, whereas the .find-based implementation of next called in a loop will stop and re-start iteration on the inner iterator a lot of times. There are a few iterator types (e.g. Chain, which needs to check which half of the chain it’s in for every time iteration is paused and resumed) where looping in one go is significantly better than calling next (or try_fold) a bunch of times in a row, and that’s a main reason why implementing more than just .next for the Iterator for Filter<…> implementation is beneficial for performance.

steffahn · January 15, 2023, 3:27pm

For further illustration, here’s a complete minimal (and less performant) “fake” re-implementation of all the functions involved

trait MyIterator {
    type Item;
    fn next(&mut self) -> Option<Self::Item>;
}

struct VecIter<T>(Vec<T>);
fn vec_into_iter<T>(v: Vec<T>) -> VecIter<T> {
    let mut v = v;
    v.reverse(); // reverse, since the iterator uses pop
    // the standard library doesn't do something like this, they use 
    // unsafe code to *actually* consume the `Vec` starting from the front
    VecIter(v)
}
impl<T> MyIterator for VecIter<T> {
    type Item = T;
    fn next(&mut self) -> Option<T> {
        self.0.pop()
    }
}

fn filter<I: MyIterator, F: FnMut(&I::Item) -> bool>(it: I, pred: F) -> Filter<I, F> {
    Filter { it, pred }
}
struct Filter<I, F> {
    it: I,
    pred: F,
}
impl<I: MyIterator, F: FnMut(&I::Item) -> bool> MyIterator for Filter<I, F> {
    type Item = I::Item;
    fn next(&mut self) -> Option<I::Item> {
        while let Some(item) = self.it.next() {
            if (self.pred)(&item) {
                return Some(item);
            }
        }
        None
    }
}

fn collect_vec<T>(it: impl MyIterator<Item = T>) -> Vec<T> {
    let mut v = vec![];
    let mut it = it;
    while let Some(item) = it.next() {
        v.push(item);
    }
    v
}

fn main() {
    let it: VecIter<i32> = vec_into_iter(vec![1, 2, 3, 4, 5]);
    let it2: Filter<VecIter<i32>, _> = filter(it, |e| e % 2 == 0);
    let coll: Vec<i32> = collect_vec(it2);

    println!("{:?}", coll);
}

Rust Playground

imic · January 17, 2023, 2:18pm

@steffahn Thanks for your discourse on the implementation for Iterator::collect, it is non-trivial, quite helpful and I enjoy it. And meanwhile, my original confusion was perfectly resolved on another forum, for the reference and any others who might be interesting in this topic in the future, I would put the link below.

imic · January 17, 2023, 2:21pm

Solved by rust - How does "Iterator::collect" make transformation? - Stack Overflow

system · April 17, 2023, 2:21pm

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.

Topic		Replies	Views
How to collect... nothing? help	14	7795	January 12, 2023
Try_collect as iter consuming operation help	6	2203	January 12, 2023
Best way to compose iterator operations	4	873	January 12, 2023
.map(), .filter(), but no .each()	13	6302	January 12, 2023
Iterator to Vec<String> help	5	25248	July 3, 2022

How does "Iterator::collect" make transformation?

Related topics