Ok, sorry, I was too snarky. Still, I think you should have cross-posted the link right away.
Thanks you for resolving the potential disagreement cleanly in-thread already. Reviewing our Code of Conduct, I found the following passage quite fitting here (emphasis mine). In particular, thank you @Jules-Bertholet for reacting in the best way possible!
In the Rust community we strive to go the extra step to look out for each other. Don’t just aim to be technically unimpeachable, try to be your best self. In particular, avoid flirting with offensive or sensitive issues, particularly if they’re off-topic; this all too often leads to unnecessary fights, hurt feelings, and damaged trust; worse, it can drive people away from the community entirely.
And if someone takes issue with something you said or did, resist the urge to be defensive. Just stop doing what it was they complained about and apologize. Even if you feel you were misinterpreted or unfairly accused, chances are good there was something you could’ve communicated better — remember that it’s your responsibility to make your fellow Rustaceans comfortable. Everyone wants to get along and we are all here first and foremost because we want to talk about cool technology. You will find that people will be eager to assume good intent and forgive as long as you earn their trust.
Would you like the PR to be r-
'd?
Yes please.
Note that the existing iterator adapters and consumers already behave unpredictably for unfused iterators. The most typical behaviours are either implicit fusing, or effectively skipping the None
values (e.g. iter::Scan
returns None
when the inner iterator yields None
, but doesn't update its state in this case, thus the None
values are effectively skipped and the scan function is applied only to Some(item)
values).
However, entirely insane behaviour is also possible. iter::Zip
normally returns the iterator with len = min(len(it_a), len(it_b))
, but if the zipped iterators are unfused, the zipping desynchronizes: the left iterator is advanced to its next Some(_)
value without advancing the right iterator, but if the right iterator also returns None
, the left iterator is advanced and the yielded values are silently dropped. This behaviour isn't documented, and is unlikely to be documented. It also doesn't fit any posisble interpretation of zipping.
I expect the adapters in itertools
to have even more crazy edge cases for unfused iterators.
If either iterator returns
None
,next
from the zipped iterator will returnNone
. If the zipped iterator has no more elements to return then each further attempt to advance it will first try to advance the first iterator at most one time and if it still yielded an item try to advance the second iterator at most one time.
You're right. Based on zip's doc:
If either iterator returns
None
,next
from the zipped iterator will returnNone
. If the zipped iterator has no more elements to return then each further attempt to advance it will first try to advance the first iterator at most one time and if it still yielded an item try to advance the second iterator at most one time.
it seems the implementation (or at least the doc) assumes the zipped iterators are fused. But there is no bound for FusedIterator. Since the next
doc doesn't require fusing, something is definitely wrong.
The only doc fix I can imagine is to change the zip
doc to require that the iterators are fused for predictable behavior. A breaking fix is to add the bounds for FusedIterator.
Isn't the "at most" part incorrect if the zipped iterators are not fused? (And next
is called again after the iterator returned by zip
returns None).
As far as I can tell, the Zip iterator itself is unfused.
I think what is meant is "at most once per call to next()
" ("attempt to advance it").
Ah, Ok. If that's true it should be clarified in the doc, but I see your point.
Wait, Zip implements FusedIterator! Oh, but only if the zipped iterators also implement it.
Sorry for the confusion. You're right that the Zip iterator works as documented, technically. However, using it to zip unfused iterators would give strange behavior (the desynchronization that @afetisov mentioned), and that aspect could, probably should, be documented.
Many of the std adapters do behave predictably after next()
, only some are problematic. Here is the list of the ones I could find that are weird:
Chain
as noted earlier.MapWhile
is explicitly unfused even with a fused inner iterator (it refuses to commit to its exact behavior), however the otherwise very similarTakeWhile
is fused for fused inner iterators.Skip
will always stop skipping elements once its inner iterator returnsNone
(weakly implied by docs).SkipWhile
does not do this.Take
does not fuse internally, though its documentation weakly implies that it does (with the same language asSkip
).StepBy
will always step byn
on all calls tonext()
except for the first, even those afterNone
is returned.Zip
as noted earlier.ArrayChunks
(unstable) does not fuse internally, but never updates itsremainder
after the first time it is set, even if the inner iterator is unfused.Intersperse
andIntersperseWith
(both unstable) fuse internally (not documented).MapWindows
(unstable) fuses (documented).
To me, the reason I've always considered resumable iterators [1] second-class and avoided them is that that's how the ownership checks in the language and library treat them.
for x in my_iter
? You can't use it again, because that consumed it.
.extend(my_iter);
? You can't use it again, because that consumed it.
my_iter.fold(…)
? You can't use it again, because that consumed it.
Pervasively in rust the only things that work on &mut self
are the things like find
or any
or nth
that commonly stop before hitting the None
that marks the end of iteration.
All those things are pretty clearly telling me that I shouldn't use an iterator after it's been fully iterated once. So I've always felt like "it can return Some after None" was like "you can't trust that Ord
is actually a total order", since in my mind if re-iterating was something that I was supposed to think of as normal in iterators, Iterator::count
would be &mut self
instead of self
, for example.
I'm distinguishing that from things that aren't fused but also aren't intended to be used again after returning
None
↩︎
It's actually very strange that this is the behaviour that was stabilized and documented. I would expect a more reasonable "both iterators are always polled once; if either returns None
, Zip
yields None
, otherwise it yields Some((item_a, item_b))
". I can only assume that the current behaviour was implemented by mistake, and by the time anyone noticed, changing it could be disruptive, so instead the existing behaviour was documented.
It seems likely; I can easily see myself making this mistake— If you neglect the non-fused case, the current behavior is both idiomatic to write and has the advantage of short-circuiting to save computation:
fn next(&mut self) -> Option<(A::Item, B::Item)> {
let x = self.a.next()?;
let y = self.b.next()?;
Some((x, y))
}
The non-desyncing version looks a little more cumbersome:
fn next(&mut self) -> Option<A::Item, B::Item> {
match (self.a.next(), self.b.next()) {
(Some(x), Some(y)) => Some((x,y)),
_ => None
}
}
Except when the iterator is by_ref()
'd (or exclusively borrowed). AFAICT the use-case for this is specifically to work around the transfer of ownership for all of these other methods. It's not commonly seen, I guess, but certainly plausible and already well-supported.
This approach probably has small potential performance downsides, too, and furthermore even behavioral ones. We need not only worry about iterators not being fused but also iterators being (re-)borrows of other iterators.
In fact, I wouldn't be surprised if the current documentation quoted above actually was written without any non-fused use-case in mind, instead only focusing on re-borrowed/by_ref
iterators. The documentation allows you to put a shorter iterator left and a longer right in the Zip
and then gain the guarantee that no items from the longer iterator are discarded. E.g.
zip(5..10, other_iterator.by_ref()).for_each(|_| …)
would be guaranteed not to discard any 6th item from other_iterator
whereas
zip(other_iterator.by_ref(), 5..10).for_each(|_| …)
does discard one item.
This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.