Use an iterator twice?

I have IntoIterator as argument.
Is that possible to use iterator got from it twice without using collect and allocating extra memory?

You can require that Iterator: Clone and only clone the iterator.

2 Likes

In the general case, no - std::iter::Iterator makes no promises, and requires nothing from implementations, that would allow you to go back to the start of the iteration. To make this concrete, one valid implementation of Iterator might be

struct Counter(u8);

impl std::iter::Iterator for Counter {
    type Item = u8;
    
    fn next(&mut self) -> Option<u8> {
        let v = self.0;
        self.0 += 1;
        Some(v)
    }
}

While it's clearly possible to reset this iterator (set the value back to zero and call next again), nothing actually exposed by the type does that. You can also imagine a version where the next implementation calls an RNG or reads from a file (ignoring IO errors).

However, you can constrain the type of the iterator further (H2CO3 already demonstrated one option) or the type of the IntoIterator. Iterators that can be cloned implement Clone, iterators that can be run backwards implement DoubleEndedIterator, and so on.

What are you doing in each of the two iterations?

4 Likes

That

fn node_extend_out_nodes<Id, Iter>(&mut self, node_id: Id, out_nodes_iter: Iter)
where
    Iter: IntoIterator<Item = Id>,
    Id: Into<ID!()>,
{
    let out_nodes_iter2: Vec<_> = out_nodes_iter
        .into_iter()
        .map(|id| {
            let id = id.into();
            self.node(id);
            id
        })
        .collect();
    self.node_mut(node_id.into()).extend(out_nodes_iter2);
}

Thanks for advises.

Someone will check me on this, but you should be able to drop the collect there. map is lazy: the iterator is not actually advanced until something calls next, directly or indirectly, so some_iter.map(f) only produces an iterator which has not yet been used, which will map its results one at a time as it's advanced.

Does

fn node_extend_out_nodes<Id, Iter>(&mut self, node_id: Id, out_nodes_iter: Iter)
where
    Iter: IntoIterator<Item = Id>,
    Id: Into<ID!()>,
{
    let out_nodes_iter2: Vec<_> = out_nodes_iter
        .into_iter()
        .map(|id| {
            let id = id.into();
            self.node(id);
            id
        });
    self.node_mut(node_id.into()).extend(out_nodes_iter2);
}

not do what you want?

self.node() borrows, self.node_mut() borrows mutably at the same time

55 |       .map( | id |
   |             ------ immutable borrow occurs here
...
58 |         self.node( id );
   |         ---- first borrow occurs due to use of `*self` in closure
...
63 |       self.node_mut( node_id.into() ).extend( out_nodes_iter2 );
   |       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^         --------------- immutable borrow later used here
   |       |
   |       mutable borrow occurs here

Well, then your problem is not that you want to use the iterator twice. You only need it once. collect() helps not because you need copies of the iterator (or of its elements), but because collecting performs the iteration upfront, ending the immutable borrows before the mutable borrow begins. So cloning the iterator won't help.

The problem is that you are trying to mutate self while also holding immutable borrows into it. That is not possible. If the mutable and immutable borrows refer to disjoint sets of fields of self, then you can rewrite the node() and node_mut() methods in order to only take the fields that are necessary. Or maybe bypass accessor methods completely and use the fields directly, so that the borrow checker can locally reason about the body of the node_extend_out_nodes() function.

3 Likes

Solution with cellect() works. But that solution is not efficient, right?
So, solution with 2 iterators also should work.

fn node_extend_out_nodes<Id, Iter>(&mut self, node_id: Id, out_nodes_iter: Iter)
where
    Iter: IntoIterator<Item = Id>,
    Id: Into<ID!()>,
{
    let out_nodes_iter2: Vec<_> = out_nodes_iter
        .into_iter()
        .map(|id| {
            let id = id.into();
            self.node(id);
            id
        })
        .collect();
    self.node_mut(node_id.into()).extend(out_nodes_iter2);
}

This works.

You can't just make a blanket "it's not efficient" statement. Most code doesn't need to avoid allocations to be fast enough (and allocation doesn't change the asymptotics of the code). Usually, if memory safety requires this (as in this case), it's fine to collect, and only spend time optimizing the function if you find, via actual benchmarking, that it's not fast enough.

But that's exactly my point – using two iterators wouldn't help, because iterators are lazy, so in particular map needs to keep the transform closure around, which results in prolonged immutable borrows even if you clone the iterator itself.

6 Likes

I do realize that it works. I explained in my previous post exactly why it works.

1 Like

Eh.. I see..

You know, variant with clone of the original iterator actually works fine:

fn node_extend_out_nodes<Id, Iter>(&mut self, node_id: Id, out_nodes_iter: Iter)
where
    Iter: IntoIterator<Item = Id>,
    Iter::IntoIter: Clone,
    Id: Into<ID!()>,
{
    let iter = out_nodes_iter.into_iter();
    let iter2 = iter.clone();

    iter.map(|id| {
        let node = self.node(id);
    });

    let iter3 = iter2.into_iter().map(|id| {
        let id = id.into();
        id
    });

    self.node_mut(node_id.into()).extend(iter3);
}

No, it doesn't. It might compile, but it doesn't do the same thing – in particular, it never actually calls self.node(id), again, because map is lazy. This piece of code is equivalent with:

self.node_mut(node_id.into()).extend(out_nodes_iter.into_iter().map(Into::into));

By the way, it compiles because the borrow checker sees that iter is not used after the first iter.map() so it can end the immutable borrow imposed by self.node(id).

1 Like

You are right. It didn't call the first map callback.

fn node_extend_out_nodes<Id, Iter>(&mut self, node_id: Id, out_nodes_iter: Iter)
where
    Iter: IntoIterator<Item = Id>,
    Iter::IntoIter: Clone,
    Id: Into<ID!()>,
{
    let iter = out_nodes_iter.into_iter();
    let iter2 = iter.clone();

    iter.map(|id| {
        let node = self.node(id);
    })
    .fold((), |acc, x| ());

    let iter3 = iter2.into_iter().map(|id| {
        let id = id.into();
        id
    });

    self.node_mut(node_id.into()).extend(iter3);
}

But this code does call both map and no compilation error.

Well, it still doesn't have exactly the same behavior as your original code. It first performs all the self.node(id) calls in one go, and then converts the IDs in another iteration. I don't know whether that matters – it might, because your self type sounds like it's stateful. But if it doesn't, and not interleaving the calls to self.node(id) with id.into() is fine, then you've got a solution.

2 Likes

self.node( id ) just assert the node exists
self.node_mut( id ) returns mutable reference on a node
original code did 2 passes, as I understand
1 pass - check all nodes exist and collect ids into a vector to reuse it
2 pass - mutate a specific node
as I understand new code do the same thing, but does not allocate a new buffer to collect items

Is it fast enough for your needs?

Is there a reason to believe it will be too slow in practice on the data sets you expect to deal with?

Creating a copy isn't necessarily a bad thing. It makes your life as a programmer easier, and assuming the iteration happens in linear time and the function you're using with map runs in constant time, then if it's "fast enough" now, it may be fast enough in application, as well. Your users do not need the absolutely fastest implementation: they need any implementation that solves the problem with the resources (time, memory, etc) that they have.

4 Likes