Am I the only one who uses non-fused iterators?

Ok, sorry, I was too snarky. Still, I think you should have cross-posted the link right away.

7 Likes

Thanks you for resolving the potential disagreement cleanly in-thread already. Reviewing our Code of Conduct, I found the following passage quite fitting here (emphasis mine). In particular, thank you @Jules-Bertholet for reacting in the best way possible! :slight_smile: :crab: :smiley_cat:

In the Rust community we strive to go the extra step to look out for each other. Don’t just aim to be technically unimpeachable, try to be your best self. In particular, avoid flirting with offensive or sensitive issues, particularly if they’re off-topic; this all too often leads to unnecessary fights, hurt feelings, and damaged trust; worse, it can drive people away from the community entirely.

And if someone takes issue with something you said or did, resist the urge to be defensive. Just stop doing what it was they complained about and apologize. Even if you feel you were misinterpreted or unfairly accused, chances are good there was something you could’ve communicated better — remember that it’s your responsibility to make your fellow Rustaceans comfortable. Everyone wants to get along and we are all here first and foremost because we want to talk about cool technology. You will find that people will be eager to assume good intent and forgive as long as you earn their trust.

6 Likes

Would you like the PR to be r-'d?

2 Likes

Yes please.

1 Like

Note that the existing iterator adapters and consumers already behave unpredictably for unfused iterators. The most typical behaviours are either implicit fusing, or effectively skipping the None values (e.g. iter::Scan returns None when the inner iterator yields None, but doesn't update its state in this case, thus the None values are effectively skipped and the scan function is applied only to Some(item) values).

However, entirely insane behaviour is also possible. iter::Zip normally returns the iterator with len = min(len(it_a), len(it_b)), but if the zipped iterators are unfused, the zipping desynchronizes: the left iterator is advanced to its next Some(_) value without advancing the right iterator, but if the right iterator also returns None, the left iterator is advanced and the yielded values are silently dropped. This behaviour isn't documented, and is unlikely to be documented. It also doesn't fit any posisble interpretation of zipping.

I expect the adapters in itertools to have even more crazy edge cases for unfused iterators.

3 Likes

It is:

If either iterator returns None, next from the zipped iterator will return None. If the zipped iterator has no more elements to return then each further attempt to advance it will first try to advance the first iterator at most one time and if it still yielded an item try to advance the second iterator at most one time.

2 Likes

You're right. Based on zip's doc:

If either iterator returns None, next from the zipped iterator will return None. If the zipped iterator has no more elements to return then each further attempt to advance it will first try to advance the first iterator at most one time and if it still yielded an item try to advance the second iterator at most one time.

it seems the implementation (or at least the doc) assumes the zipped iterators are fused. But there is no bound for FusedIterator. Since the next doc doesn't require fusing, something is definitely wrong.

The only doc fix I can imagine is to change the zip doc to require that the iterators are fused for predictable behavior. A breaking fix is to add the bounds for FusedIterator.

Isn't the "at most" part incorrect if the zipped iterators are not fused? (And next is called again after the iterator returned by zip returns None).

As far as I can tell, the Zip iterator itself is unfused.

I think what is meant is "at most once per call to next()" ("attempt to advance it").

1 Like

Ah, Ok. If that's true it should be clarified in the doc, but I see your point.

Wait, Zip implements FusedIterator! Oh, but only if the zipped iterators also implement it.

Sorry for the confusion. You're right that the Zip iterator works as documented, technically. However, using it to zip unfused iterators would give strange behavior (the desynchronization that @afetisov mentioned), and that aspect could, probably should, be documented.

Many of the std adapters do behave predictably after next(), only some are problematic. Here is the list of the ones I could find that are weird:

  • Chain as noted earlier.
  • MapWhile is explicitly unfused even with a fused inner iterator (it refuses to commit to its exact behavior), however the otherwise very similar TakeWhile is fused for fused inner iterators.
  • Skip will always stop skipping elements once its inner iterator returns None (weakly implied by docs). SkipWhile does not do this.
  • Take does not fuse internally, though its documentation weakly implies that it does (with the same language as Skip).
  • StepBy will always step by n on all calls to next() except for the first, even those after None is returned.
  • Zip as noted earlier.
  • ArrayChunks (unstable) does not fuse internally, but never updates its remainder after the first time it is set, even if the inner iterator is unfused.
  • Intersperse and IntersperseWith (both unstable) fuse internally (not documented).
  • MapWindows (unstable) fuses (documented).
6 Likes

To me, the reason I've always considered resumable iterators [1] second-class and avoided them is that that's how the ownership checks in the language and library treat them.

for x in my_iter? You can't use it again, because that consumed it.
.extend(my_iter);? You can't use it again, because that consumed it.
my_iter.fold(…)? You can't use it again, because that consumed it.

Pervasively in rust the only things that work on &mut self are the things like find or any or nth that commonly stop before hitting the None that marks the end of iteration.

All those things are pretty clearly telling me that I shouldn't use an iterator after it's been fully iterated once. So I've always felt like "it can return Some after None" was like "you can't trust that Ord is actually a total order", since in my mind if re-iterating was something that I was supposed to think of as normal in iterators, Iterator::count would be &mut self instead of self, for example.


  1. I'm distinguishing that from things that aren't fused but also aren't intended to be used again after returning None ↩︎

5 Likes

It's actually very strange that this is the behaviour that was stabilized and documented. I would expect a more reasonable "both iterators are always polled once; if either returns None, Zip yields None, otherwise it yields Some((item_a, item_b))". I can only assume that the current behaviour was implemented by mistake, and by the time anyone noticed, changing it could be disruptive, so instead the existing behaviour was documented.

3 Likes

It seems likely; I can easily see myself making this mistake— If you neglect the non-fused case, the current behavior is both idiomatic to write and has the advantage of short-circuiting to save computation:

fn next(&mut self) -> Option<(A::Item, B::Item)> {
    let x = self.a.next()?;
    let y = self.b.next()?;
    Some((x, y))
}

The non-desyncing version looks a little more cumbersome:

fn next(&mut self) -> Option<A::Item, B::Item> {
    match (self.a.next(), self.b.next()) {
        (Some(x), Some(y)) => Some((x,y)),
        _ => None
    }
}
1 Like

Except when the iterator is by_ref()'d (or exclusively borrowed). AFAICT the use-case for this is specifically to work around the transfer of ownership for all of these other methods. It's not commonly seen, I guess, but certainly plausible and already well-supported.

7 Likes

This approach probably has small potential performance downsides, too, and furthermore even behavioral ones. We need not only worry about iterators not being fused but also iterators being (re-)borrows of other iterators.

In fact, I wouldn't be surprised if the current documentation quoted above actually was written without any non-fused use-case in mind, instead only focusing on re-borrowed/by_ref iterators. The documentation allows you to put a shorter iterator left and a longer right in the Zip and then gain the guarantee that no items from the longer iterator are discarded. E.g.

zip(5..10, other_iterator.by_ref()).for_each(|_| …)

would be guaranteed not to discard any 6th item from other_iterator whereas

zip(other_iterator.by_ref(), 5..10).for_each(|_| …)

does discard one item.

6 Likes

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.