Am I the only one who uses non-fused iterators?

Most iterators are FusedIterator; once they return None, they will always keep doing so. But they don't have to be:

  • An API produces data. Sometimes there is new data available, sometimes there isn't (but there could be later). This is modeled quite nicely by an un-fused iterator that returns None in the "no new data" case.
    • If you are using async, a Stream/AsyncIterator may be a better choice, bit sometimes you don't want to rearchitect your entire application to be async.
  • A cursor over a cyclic data structure (like linked_list::Cursor)—these can also be modeled as an un-fused iterator, and in fact the Cursor docs make the analogy. (One issue is the lack of a trait for reversing the direction of iteration without changing the position.)

I used an un-fused iterator just yesterday, I think they are neat and a useful tool to have in one's toolbox. However, when I brought up the question of supporting un-fused iteration in the upcoming gen blocks feature, it became apparent that mine is a minority view:

Have you used deliberately unfused iterators before? If so, what for? If not, why not, and what do you prefer instead?

1 Like

TryIter is a notable example that may temporarily return None.

5 Likes

Yes, one use I remember of the top of my head was where I needed something like Windows per-row in a 3d array-esque data structure[1], consumed in a nested loop. Resumability was a natural for the implementation and the consumption. Basically if the current row is consumed, chop up the uniterated slice into the next row and remainder and return None; the subsequent next call will resume iteration in the new row.

I've never considered it odd since I've understood that non-fused iterators have been a part of the design from the start. That's why fusing is it's own method / trait. This came up recently in some other thread and I made a token effort to find the source of not assuming iterators to be fused, but gave up with a shrug after finding references to unfused iterators in discussions from 2013 or so. I.e. I concluded it's been like this in Rust "forever"ish.


  1. triangular actually ↩︎

4 Likes

There was an interesting recent discussion about that:
Idiomatic way of splitting iterator of Result into two iterators (Ok, Err)


The part about fused iterators starts here.

Can you say why the un-fused iterator is better than, say, an iterator that returns options?

I avoid un-fused iterators because there's lots of adapters that will internally fuse things. Like you can't meaningfully .chain(...) on an unfused iterator, because the Chain will never look at it again after the first None.

In general, anything working on generic iterators will try not to look at the iterator after it returns None, because that could panic. And I don't like writing code against concrete iterators -- if it's a concrete type it can use a different API.

6 Likes
  • Better codegen, as no pointless second None check
  • Many iterator adapters and APIs do handle un-fused iterators just fine.
    • You can do a for loop over an unfused iterator, versus having to use while let Some(Some(item)) = ITER.next().
    • You can extend() a Vec with an un-fused iterator (will push all the yielded elements up to the first None onto the vec), doing that with an iterator of options would be more pointless work.
    • Map works with un-fused iterators, with an iterator of options you would have to do a nested map on the option contents (again, doable, but makes code more complex for 0 benefit)

APIs shouldn't panic outside of clearly documented cases, and Iterator::next() has no such documentation, so if an iterator randomly decides to panic after having next() called on it one too many times, that's a bug in the iterator. That being said, "stop looking at the iterator after it returns None once" is the behavior I desire and expect from many Iterator-consuming APIs. For example, with Vec::extend(), my expectation is that the iterator will be consumed only up to the first None; if I want to add more elements from the un-fused iterator onto the Vec, I'll call extend() again.

6 Likes

I've never needed to chain an un-fused iterator, so this doesn't affect me at all, but my naïve expectation would that chain would "wrap around" and start polling the first iterator again after forwarding a None from the second iterator. (I have no idea how performant that would be, and I don't have a use-case for it anyway, but I think chain's behavior, or its lack of guarantee about its behavior, should at least be documented.)

1 Like

Perhaps it would have been clearer to have something like:

trait UnfusedIterator {
  type UnfusedItem;
  fn next() -> Self::UnfusedItem;
}

trait Iterator: UnfusedIterator<UnfusedItem = Option<Self::Item>> {
  type Item;
}

Then the current Iterator trait is basically UnfusedIterator with an Option, but the "normal" trait to use is the fused one.

I could've sworn that calling Iterator::next after the iterator is exhausted was explicitly called out as an error condition and thus allowed to panic, even if it's preferred to provide fused iterators. But checking the docs now, I don't see any reference to that. It's possible I just recall implementation comments noting that this case has to be handled :sweat_smile:

(If you have an iterator of options and want to use it like a usual iterator, that can be done via .map_while(|x| x).)

3 Likes

Why would it? chain is explicitly documented as just yielding values from iterators in order.

I don't consider unfused iterators to really be iterators. They are just an iterator-shaped edge case. The shape of the API cannot prevent their existence, so they are something that one should keep in mind and handle robustly, but they are not what I expect when talking about iteration, and I would eagerly eliminate them with .fuse() if it makes my life simpler. Panicking after returning None is, in my view, an entirely valid implementation of Iterator, even though I'd generally just return None for completed iterators for simplicity and robustness reasons.

The core issue is that an unfused iterator has no way to signal the end of iteration. This means that all such iterators need to be treated as infinite, or require some out-of-band signalling. The former is already a dangerous edge case, while the latter is not something which is encoded in the Iterator API, so is some custom iterator-shaped gimmick.

Also, for me the raison d'être of iterators is to be iterated over with a for-loop. Unfused iterators cannot be used as-is with for-loops. Either you pass the iterator by value, in which case it is effectively implicitly fused, or you need to employ some trick like passing &mut it, which means you're not, strictly speaking, dealing with the original iterator anymore. You're using some adapters to turn an iterator-shaped thing into an actual usable iterator. Another option is manual handling with loop, but at that point you're not really using the Iterator interface anymore, your next() function could return anything.

Similar considerations apply to Vec::extend or .collect() calls. All of them expect the iterators to be properly fused, and will effectively implicitly fuse them, stopping at first None. They couldn't behave in any other way, since they have no other available condition to stop the loop.

I struggle to think of an example where that would be an issue. But even if it is, it just means that your use case is better handled with a nested iterator. It also makes it clear that you expect to continue after the first None value, rather than stop immediately.

I consider the question "can an iterator return Some after a None?" in the same category as "can you implement T::default() as rand::random()?" or "what does a completed Future return from poll()?" (it could e.g. manufacture Drop types out of thin air). The answer is, effectively, "undefined, and you should not ever rely on specific behaviour in that case, but you cannot rule out arbitrary behaviour either, so you must handle it robustly".

6 Likes

An unfused iterator ends iteration like every other iterator: by returning None. The only difference is that this end isn't permanent.

Not a problem, adding &mut is not the end of the world.

And that's great! That's the behavior I want. If I later need to add more elements from the iterator to the Vec (perhaps after learning through some out-of-band signal that the iterator is ready to produce more elements again), I'll call extend a second time at my convenience.


The standard library docs say nothing of the sort:

Returns None when iteration is finished. Individual iterator implementations may choose to resume iteration, and so calling next() again may or may not eventually start returning Some(Item) again at some point.

(Compare Future, where polling after completion is explicitly called out as an error case that's allowed to panic.)

3 Likes

It can signal the end of an iteration, not of all iterations. Unlike a a nested (fused) iterator.

I think this example reveals that an unfused iterator is essentially just a FnMut() -> T, and most relevant Iterator operations can be easily applied to it using normal closure syntax.

2 Likes

The docs also don't require explicitly that next() doesn't panic, and all Rust functions are assumed to be capable of panics, unless explicitly specified otherwise. I can panic in its impl because I have a bug. Why shouldn't I be able to panic because I'm done iterating?

Nothing is the end of the world apart from the actual end of the world, but the fact that your concept doesn't work out-of-the-box as expected and requires some unspecified external knowledge to use properly shows that your interface is not properly designed. You can close your eyes to the fact that you need to hack it into proper shape with implied tricks, or you can design it in the way which makes the proper use obvious. Not every interface allow this kind of clean lego-like design, but Iterator certainly does. I see no reason to do otherwise other than the desire to show off a knowledge of obscure technicalities. Just write your unfused iterator as a proper nested iterator, and make it work like everyone expects iterators to work.

1 Like

Note that there are iterators in std that are unfused, and that work as unfused iterators by design. If we're getting rid of unfused iterators in general, then you need to work out what the correct redesign is for such iterators.

And they have their uses when you want to process all available items in a channel (if any), but do not want to block waiting for a new item in this channel. They're an edge case, sure, but they're a useful edge case.

8 Likes

If an interface panics outside of a clearly documented condition, that is a bug in the interface. As the std docs say:

The panic! macro is used to construct errors that represent a bug that has been detected in your program.

However, unsafe code can't assume arbitrary safe code is free of bugs, and so still needs to be robust to panics.

Many APIs require external knowledge to use properly.

Easier said than done:

let mut iterator_of_iterators = unfused_iterator.convert_into_proper_nested_iterator_like_God_intended();

let mut foo = iterator_of_iterators.next().unwrap();
foo.next();

// We never finished iterating over `foo`,
// but we're already moving on to the next inner iterator?
for item in iterator_of_iterators.next().unwrap() {
   // where do we source `item` from
}

// And now we've switched back to `foo`?
// what do we do?
for item in foo {
    // ...
}
3 Likes

None will be returned when there are no pending values remaining or if the corresponding channel has hung up.

I'd like to know the reasoning why those two entirely different cases were mashed up in a single None value. The source doesn't comment on it in any way. How am I supposed to know when to drop this TryIter, if I can't distinguish empty and closed channel? Looks like a glaring design flaw, which doesn't usually happen in Rust, so there must be some good reason. Why doesn't it return Option<Result<T, TryRecvError>>, for example, with None corresponding to a closed channel?

It could be a bug in the interface, or a misuse of the interface in the calling code. I consider it the latter. If the implementation of Iterator should be non-panicking, it should explicitly say so, like the docs for From do:

Note: This trait must not fail. The From trait is intended for perfect conversions. If the conversion can fail or is not perfect, use TryFrom.

Or AsRef:

Note: This trait must not fail. If the conversion can fail, use a dedicated method which returns an Option<T> or a Result<T, E>.

Many APIs are bad. There is no reason to resort to comments for the facts which can be easily modeled in the type system. Many use cases can't fit this mold, but unfused iterators can.

Ideally we'd get lending iterators and this issue would be solved, but in the meantime one can emulate lending iterators manually, by implementing fn next_iter<'a>(&'a mut self) -> Iter<'a>.

Returning an iterator of options is also always possible. You can't turn it into a generic adapter on unfused iterators since you don't know when to consider the iterator exhausted. But you can avoid unfused iterators in the first place. I doubt that a single extra None check will make a difference in performance, and you need to check each item anyway.

Finally, perhaps your original iterator doesn't care that much about the order of items, so that all individual iterators can coexist and be polled simultaneously. Even if can lead to a bug, one can just document that the iterators must be polled in order. If you're fine with documentation-level correctness for unfused iterators, why would nested iterators be treated differently? The most common case anyway would look like a pair of nested loops, which would work as expected.

P.S.: By the way, shouldn't there be a layout optimization which folds Option<Option<T>> to

enum { SomeSome(T), SomeNone, NoneNone }

so that getting the T would be just a single check?

When I was using TryIter in a toy project yesterday, the "closed channel" case would have been something that could only occur due to a bug in my code. (In fact, I initially implemented a version of TryIter that would panic on a closed channel instead of yielding None, but later decided the extra code wasn't worth it.) Certainly there are situations where you need to distinguish the two cases, but there are also situations where you don't, and TryIter works fine for those.

Not quite:

let mut iterator_of_iterators = unfused_iterator.convert_into_nested_lending_iterator();

let mut foo = iterator_of_iterators.next().unwrap();
drop(foo);

// We never finished iterating over `foo`,
// but we're already moving on to the next inner iterator?
// (which we can still do, even with a lending iterator!)
for item in iterator_of_iterators.next().unwrap() {
   // where do we source `item` from
}

But then you lose the ability to easily use extend() et al.

I agree, the performance issues caused by religiously avoiding unfused iterators are generally minor, it's the ergonomics issues I really care about.

1 Like
2 Likes

How nice of you to post this link here only once the commit is marked for rollup.

Congratulations on your certificate of excellency in bureaucratic authoritarianism, for resolving opinion disputes with legislation.

I had no idea I would get an r+ that fast, that was libs-api's decision not mine. I actually agree that my wording has problems, I think your criticisms on the GitHub thread are valid and I was hoping for more rounds of review before merging. (I cited Cunningham's law for a reason.)

5 Likes