I have IntoIterator
as argument.
Is that possible to use iterator got from it twice without using collect
and allocating extra memory?
You can require that Iterator: Clone
and only clone the iterator.
In the general case, no - std::iter::Iterator
makes no promises, and requires nothing from implementations, that would allow you to go back to the start of the iteration. To make this concrete, one valid implementation of Iterator might be
struct Counter(u8);
impl std::iter::Iterator for Counter {
type Item = u8;
fn next(&mut self) -> Option<u8> {
let v = self.0;
self.0 += 1;
Some(v)
}
}
While it's clearly possible to reset this iterator (set the value back to zero and call next
again), nothing actually exposed by the type does that. You can also imagine a version where the next
implementation calls an RNG or reads from a file (ignoring IO errors).
However, you can constrain the type of the iterator further (H2CO3 already demonstrated one option) or the type of the IntoIterator. Iterators that can be cloned implement Clone
, iterators that can be run backwards implement DoubleEndedIterator, and so on.
What are you doing in each of the two iterations?
That
fn node_extend_out_nodes<Id, Iter>(&mut self, node_id: Id, out_nodes_iter: Iter)
where
Iter: IntoIterator<Item = Id>,
Id: Into<ID!()>,
{
let out_nodes_iter2: Vec<_> = out_nodes_iter
.into_iter()
.map(|id| {
let id = id.into();
self.node(id);
id
})
.collect();
self.node_mut(node_id.into()).extend(out_nodes_iter2);
}
Thanks for advises.
Someone will check me on this, but you should be able to drop the collect
there. map
is lazy: the iterator is not actually advanced until something calls next
, directly or indirectly, so some_iter.map(f)
only produces an iterator which has not yet been used, which will map its results one at a time as it's advanced.
Does
fn node_extend_out_nodes<Id, Iter>(&mut self, node_id: Id, out_nodes_iter: Iter)
where
Iter: IntoIterator<Item = Id>,
Id: Into<ID!()>,
{
let out_nodes_iter2: Vec<_> = out_nodes_iter
.into_iter()
.map(|id| {
let id = id.into();
self.node(id);
id
});
self.node_mut(node_id.into()).extend(out_nodes_iter2);
}
not do what you want?
self.node()
borrows, self.node_mut()
borrows mutably at the same time
55 | .map( | id |
| ------ immutable borrow occurs here
...
58 | self.node( id );
| ---- first borrow occurs due to use of `*self` in closure
...
63 | self.node_mut( node_id.into() ).extend( out_nodes_iter2 );
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ --------------- immutable borrow later used here
| |
| mutable borrow occurs here
Well, then your problem is not that you want to use the iterator twice. You only need it once. collect()
helps not because you need copies of the iterator (or of its elements), but because collecting performs the iteration upfront, ending the immutable borrows before the mutable borrow begins. So cloning the iterator won't help.
The problem is that you are trying to mutate self
while also holding immutable borrows into it. That is not possible. If the mutable and immutable borrows refer to disjoint sets of fields of self
, then you can rewrite the node()
and node_mut()
methods in order to only take the fields that are necessary. Or maybe bypass accessor methods completely and use the fields directly, so that the borrow checker can locally reason about the body of the node_extend_out_nodes()
function.
Solution with cellect()
works. But that solution is not efficient, right?
So, solution with 2 iterators also should work.
fn node_extend_out_nodes<Id, Iter>(&mut self, node_id: Id, out_nodes_iter: Iter)
where
Iter: IntoIterator<Item = Id>,
Id: Into<ID!()>,
{
let out_nodes_iter2: Vec<_> = out_nodes_iter
.into_iter()
.map(|id| {
let id = id.into();
self.node(id);
id
})
.collect();
self.node_mut(node_id.into()).extend(out_nodes_iter2);
}
This works.
You can't just make a blanket "it's not efficient" statement. Most code doesn't need to avoid allocations to be fast enough (and allocation doesn't change the asymptotics of the code). Usually, if memory safety requires this (as in this case), it's fine to collect, and only spend time optimizing the function if you find, via actual benchmarking, that it's not fast enough.
But that's exactly my point – using two iterators wouldn't help, because iterators are lazy, so in particular map
needs to keep the transform closure around, which results in prolonged immutable borrows even if you clone the iterator itself.
I do realize that it works. I explained in my previous post exactly why it works.
Eh.. I see..
You know, variant with clone of the original iterator actually works fine:
fn node_extend_out_nodes<Id, Iter>(&mut self, node_id: Id, out_nodes_iter: Iter)
where
Iter: IntoIterator<Item = Id>,
Iter::IntoIter: Clone,
Id: Into<ID!()>,
{
let iter = out_nodes_iter.into_iter();
let iter2 = iter.clone();
iter.map(|id| {
let node = self.node(id);
});
let iter3 = iter2.into_iter().map(|id| {
let id = id.into();
id
});
self.node_mut(node_id.into()).extend(iter3);
}
No, it doesn't. It might compile, but it doesn't do the same thing – in particular, it never actually calls self.node(id)
, again, because map
is lazy. This piece of code is equivalent with:
self.node_mut(node_id.into()).extend(out_nodes_iter.into_iter().map(Into::into));
By the way, it compiles because the borrow checker sees that iter
is not used after the first iter.map()
so it can end the immutable borrow imposed by self.node(id)
.
You are right. It didn't call the first map callback.
fn node_extend_out_nodes<Id, Iter>(&mut self, node_id: Id, out_nodes_iter: Iter)
where
Iter: IntoIterator<Item = Id>,
Iter::IntoIter: Clone,
Id: Into<ID!()>,
{
let iter = out_nodes_iter.into_iter();
let iter2 = iter.clone();
iter.map(|id| {
let node = self.node(id);
})
.fold((), |acc, x| ());
let iter3 = iter2.into_iter().map(|id| {
let id = id.into();
id
});
self.node_mut(node_id.into()).extend(iter3);
}
But this code does call both map and no compilation error.
Well, it still doesn't have exactly the same behavior as your original code. It first performs all the self.node(id)
calls in one go, and then converts the IDs in another iteration. I don't know whether that matters – it might, because your self
type sounds like it's stateful. But if it doesn't, and not interleaving the calls to self.node(id)
with id.into()
is fine, then you've got a solution.
self.node( id )
just assert the node exists
self.node_mut( id )
returns mutable reference on a node
original code did 2 passes, as I understand
1 pass - check all nodes exist and collect ids into a vector to reuse it
2 pass - mutate a specific node
as I understand new code do the same thing, but does not allocate a new buffer to collect items
Is it fast enough for your needs?
Is there a reason to believe it will be too slow in practice on the data sets you expect to deal with?
Creating a copy isn't necessarily a bad thing. It makes your life as a programmer easier, and assuming the iteration happens in linear time and the function you're using with map
runs in constant time, then if it's "fast enough" now, it may be fast enough in application, as well. Your users do not need the absolutely fastest implementation: they need any implementation that solves the problem with the resources (time, memory, etc) that they have.