Reuse an iterator without copying

I have a function that takes an IntoIterator as input, and for some completed reason, I need to iterate it twice, the first time nothing will consume:

fn bar<Iter: IntoIterator<Item=(i32, Foo)>>(iter: Iter) {
    iter.into_iter().for_each(|(i, _)| {
        // do not consume anything
    });
    iter.into_iter().for_each(|(i, foo)| {
        // consume foo
    });
}

The compiler suggest me to add the Copy trait to the bound, but this seems not a good idea, so is there another way to accomplished my goal?

The first way is to add Clone bound:

-fn bar<Iter: IntoIterator<Item=(i32, Foo)>>(iter: Iter) {
-    iter.into_iter().for_each(|(i, _)| {
+fn bar<Iter: IntoIterator<Item=(i32, Foo)> + Clone>(iter: Iter) {
+    iter.clone().into_iter().for_each(|(i, _)| {

The second way is to unzip it:

fn bar2<Iter: IntoIterator<Item=(i32, Foo)>>(iter: Iter) {
    let (t1, t2): (Vec<_>, Vec<_>) = iter.into_iter().unzip();
    t1.into_iter().for_each(|i| {
        // do not consume anything
    });
    t2.into_iter().for_each(|foo| {
        // consume foo
    });
}

Or tee in some cases:

use itertools::Itertools;
fn bar3<Iter: IntoIterator<Item=(i32, Foo)>>(iter: Iter) {
    let (t1, t2) = iter.into_iter().tee();
    t1.for_each(|(i, _)| {
        // do not consume anything
    });
    t2.for_each(|(_, foo)| {
        // consume foo
    });
}

#[derive(Clone)]
struct Foo;
2 Likes

I edited my question, the second iteration pass need both i and foo. And in my case, copying a Foo is not an option.

Currently, my solution is to copy all is in a new vector for later use, but I don't like it, since function bar's performance matters, unnecessary malloc/free should be avoided.

This can be considered as an algorithm problem, independent of the particular Rust types you may use for it. You want to write an algorithm whose input is (i32, Foo)s and which must look at the i32s then the Foos. You want its input to consist of some sort of means to obtain (i32, Foo)s one at a time (an iterator).

The conflict is that you want to be able to obtain all the i32s before all the Foos, even though you've established that they are given in pairs one at a time. But being able to use the i32s twice and the Foos later means you must do something differently:

  • take the input in some form other than an iterator (something with more “random access”)
  • allocate temporary storage
  • be able to iterate twice somehow

Here is a possibility for the last: take two iterators. Then, while the numbers must be iterated twice, the Foos do not need to be involved:

fn bar(indexes: impl IntoIterator<Item=i32> + Clone, foos: impl IntoIterator<Item=Foo>) {
    indexes.clone().into_iter().for_each(|i| {
        // do not consume anything
    });
    indexes.into_iter().zip(foos).for_each(|(i, foo)| {
        // consume foo
    });
}

Of course, this only helps if the indexes type is relevantly cheap to clone (say, it is a Range<i32> or an &[i32], or some custom iterator that generates the number sequence — but not itself a vector). But you've got to make bar ask for some means of iterating twice, or allocate a copy itself — otherwise you're asking for time travel.

The definition of an (Into)Iterator says that, unless you add something else like Clone, or have some other particular knowledge about the type, you can't rewind.

13 Likes

You're taking an owned type and saying (Item = (i32, Foo)) you want owned stuff, which means you're going to consume it.

What do you do in Rust to to not consume things? Use borrows.

fn bar<Iter: IntoIterator<Item=(i32, Foo)>>(iter: Iter)
    where for<'a> &'a Iter: IntoIterator<Item = &'a (i32, Foo)>
{
    (&iter).into_iter().for_each(|(i, _)| {
        // do not consume anything
    });
    iter.into_iter().for_each(|(i, foo)| {
        // consume foo
    });
}

https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=1646b1c2141ac41c8c4e492f54d37ea2

12 Likes

May I ask what is where for<'a> &'a Iter: IntoIterator<Item = &'a (i32, Foo)> means? Does that mean "if the Iter is borrowed, then the Item is &'a (i32, Foo)"? What's the different with or without this line when compiling? (I know it's high rank trait bounds, but I never know what it is used for)

And what's more, the function seems to work with vectors, but not works with maps.

For Vec, the where-clause bound in the above case corresponds to impl<'a, T, A> IntoIterator for &'a Vec<T, A> where Item = &'a T.
But for map (like HashMap), the &HashMap's implementation is impl<'a, K, V, S> IntoIterator for &'a HashMap<K, V, S> where Item = (&'a K, &'a V). So you'll need this:

fn bar<Iter: IntoIterator<Item=(i32, Foo)>>(iter: Iter)
    where for<'a> &'a Iter: IntoIterator<Item = (&'a i32, &'a Foo)>

It basically means any &Iter that iterates over &(i32, Foo).

1 Like

Seems that if I want my function takes all kind of iterators, this solution will have problems.

Well, all kinds of iterators would be a lot broader than you may expected. For example you can make an iterator with websocket connection which parse each received messages and yield them directly. Since intermediate messages are not stored anywhere, to iterate it twice you need to store each elements in new vector yourself.

5 Likes

You can still handle a reasonable set of inputs, at least if we're talking about std types or others who have followed the pattern of

  • IntoIterator<Item = Owned> for Self
  • IntoIterator<Item = Borrowed> for &Self
5 Likes