What is an iterator in Rust

May I offer a way past this impasse? https://www.jumpstartyourdreamlife.com/empty-your-cup-zen-proverb/

5 Likes

I would like to reiterate that this is the only actually correct answer to the question in this thread.

If you define a datatype and make it implement the iterator trait, then it’s an iterator, no matter what kind of datatype it is or what its next() method is actually doing.

I am not a friend of trying to define “iterator” in Rust in any other way, since that will almost always give an at least slightly incorrect description. I know this is more of a formalist view, and not too helpful for some people, so I don’t discourage describing either what an iterator in general is, or what an iterator can be, in informal terms with words and comparisons, just be sure to remember that you cannot actually define what Rust considers an iterator this way.




On the topic of comparing this to other languages, I must say I’m not too familiar with the ins and outs of all of these languages, so what I’m observing might be wrong.

In C++, it seems to be extremely hard to understand what exactly an iterator is supposed to be. At least on en.ccppreference.com I couldn’t find a description that I was able to fully understand. What seems to be the case though: there are notions of “iterator” in C++ that only allow a single forward iteration, much like Rusts Iterator. However there is always a “dereference” operation that you are always allowed to to multiple times (as far as I can tell), so that it appears to be the case that C++ iterators always give (something like) pointers to the iterator’s items.

I suppose that the main reason for this is that there is no language except Rust (at least in the following list, plus C++)

that supports the kind of move semantics that Rust does. Heck, the only language here that does not take the “actually, (almost) everything is always behind a reference” approach is C++. And moves in C++ are not supported too well (from my experience). And also, they work differently (and also: less efficiently) ... see, when you want an iterator in C++ to allow its values to be moved out, access through references only is fine, since a move in C++ always leaves behind some valid kind of null-state in the variable that you moved out of.

By the way, let me say that the API for iterators in C++ is terrible. Just try to understand this website with all its preconditions and postconditions, and it doesn’t even mention that (AFAICT) the only way to know if an iterator is at its end is to compare (by ==) agains a differenty iterator that you got from the same collection but by calling .end() instead of .begin().

Nontheless, ignoring the terrible API, it seems to me that C++ iterators really aren’t too different in capabilities from what all the others, C#, Java, Kotlin, JS, Python and Ruby have to offer: you can get references/pointers to elements. Or, actually, there is a huge difference. All of those languages except C++ also have garbage collection. So a reference to something is more than just a way to access the thing while the iterator is still pointing at it. You can keep the reference and it will never become invalid as long as you hold on to it. My takeaway

  • Iterators in all those garbage collected languages mentioned above are very similar. (As far as I can tell.) They would be something like an Iterator<Rc<RefCell<T>>> in Rust (ignoring thread un-/safety).
  • Iterators in C++ allow pointer-like access while the iterator is at the position of the item. I’m having trouble understanding the details, and there are different variantions between input iterator, output iterator, etc.. whatever, but my best guess is that certain interpretations of “iterator” in C++ are like Iterator<&RefCell<T>> in Rust, or perhaps like Iterator<&T> where T: Clone? I really don’t know...

I wouldn’t say C++ is the only odd one out (ignoring the terrible API), Rust is just as different from all those other languages. But in both cases the main differences (still ignoring the terrible API in C++) is due to their approach to memory management, compared to those GC’d languages. So I’d also avoid calling them out on those differences, since they aren’t necessarily for the worst.


Especially I wouldn’t want to call out Rust on being different from, say, Java. The API for Iterator looks very similar, and with the interpretation that Java values are always like Rc<RefCell<T>> types in Rust (ignoring thread un-/safety), an Iterator<Rc<RefCell<T>>> is behaving exactly like Java’s iterator, so Rust’s Iterator is just strictly more general.

11 Likes

I see what you did there. :smiley:

3 Likes

Yeah, totally on purpose. (Actually, I didn’t even notice.)

1 Like

Given that the dictionary definition of "iterate" is "to repeat a process" then it seems to me that is exactly what a Rust iterator does. It has a "next" method that does the next repetition of whatever process. It matters not if we are talking about a process that scans over vectors or generate another random number or whatever.

Meanwhile the dictionary definition of "generate" is "to cause something to exist". That seems to apply to such things as an iterator that produces new random numbers and the like. It perhaps does not apply to an iterator that simply produces a reference to existing elements of an vector (Except perhaps in that it caused a new reference to exists, that seems to be pushing the point a bit).

So, what do I conclude? Iterator, generator, meh, all the same to me. It's just a loop.

Presumably I can define my own iterator traits that go backwards over a vector if I want.

1 Like

At the risk of making this discussion even more needlessly pedantic... I think that C++'s API is awkward exactly because it is the only language mentioned so far whose iterator concept is deliberately cursor-like instead of generator-like. A generator knows for itself when it is exhausted, but to know whether a cursor can be advanced or not, you must compare it to another cursor.

(Yes, some Rust iterators can act like cursors, and C++ iterators can act like generators -- but in order to observe this you have to break the abstraction.)

That's wrong.

I don't know about the others, but C# uses essentially the same iterator concept Rust does. It uses a compiler generated FSM to yield values lazily, iterating over something that may or may not be a collection of in memory data, such as an algorithmically generated infinite sequence.

Iterators in C# and Rust are also far more than just ranges.

That is correct.
As a side note, iterators in C++ are very well described (although I agree one can get overwhelmed by the amount of text). Each type of iterator has clear characteristics and is capable of certain behavior. I believe rust has just a simple forward iterator. If that's so (I'm not sure, but if that is the case) then I'm not surprised that you are unhappy about the amount of knowledge you have to acquire in C++ vs Rust.

That's exactly the point of Rust's strong type system and the simplification Rust Iterators are. To use an Iterator you don't have to know how to handle the pointers you get from a C++ iterator (Am I supposed to free the pointers? If I store some of the pointers will they still be valid in the future? When do I know the iterator is at its end?).

Instead you know when an Iterator is exhausted because it will return Option::None. The type system tells you what you can do with the values returned from the Iterator. If there are owned values you can move them around how you like. If there are references to your original collection (like Vec::iter() does) you know you are guaranteed that the references are valid as long as they are exist. Furthermore, Rust prevents you from modifying the original collection (or whatever source) as long as the references live. You can even prevent users from storing Iterator elements by returning a &mut T because you will still borrow the Iterator mutably which prevents you from calling next() (it requires a mutable reference to the Iterator itself).

Therefore, for me the Rust Iterator is not about how it can be compared to the concepts of other languages. In the context of Rust and its type system they are just incredible easy to use. The compiler will tell you if you're going to wrongly use an Iterator. I don't need to read pages of documentation if I want to use Iterators. They just work.

I think it will be more correct to say that only the simple forward iterator is called an "iterator" in Rust. For anything else, we'll either have other name (if this abstraction is commonly useful) or none at all (if it is applicable only in some relatively rare cases).

I would argue that this is not the case. C++20 basically introduces a new parallel definition of the C++ iterator hierarchy — std::*_­iterator vs. Cpp17*Iterator — with subtle differences. Most the algorithms are replicated in std::ranges to use the new definition, basically providing a std v2.0 in a "backward-compatible" way. (Apparently, we can't expect libraries to upgrade to the new definition for the foreseeable future.)

I see this as an attempt to rectify some ill-specified parts of the traditional iterators (for instance, the requirement that forward iterators return references is often considered over-specification), so that iterators work well with concepts; still, many requirements are specified in terms of "modeling," the violation of which results in undefined behavior.

In conclusion, although C++ iterators are mostly well-defined (for the purpose of language-lawyering), I do not consider C++ to be a great example that can be used to judge other languages as "weird" or "poorly named" when it comes to the specification of iterators. I also don't know of any language with a similar specification of iterators.

Regarding iterator vs. generator

In the rest of this thread, I'll be using Python as an example:
  • indeed, contrary to Rust, its concept of generators and their implementation has long been stabilized within the language;

  • also, its pseudo-code notation should help abstract a bit about the specifics of each language;

  • it should also alleviate a bit of the tension felt in this thread, regarding C++ vs. Rust, by taking a more neutral language. Python also happens to be one of the most widespread languages, so it can be expected that its semantics will be those that most programmers have in mind.

Let's imagine a generator, within a wave-handed / high-level description:

it's an object that (lazily) generates / produces / yields items.

In Python, for instance, you can write one as follows:

def prime_divisors (n: int) -> Iterator[int]:
    p: int = 2
    while n > 1:
        if n % p == 0:
            yield p
            n //= p
        else:
            p += 1
    return None  # Value returned when done yielding elems

I think we can all agree that the above is not a range, and does not look like a C++ iterator either.

This yield generator can be polled using a specific interface / API:

divisors: Iterator[int] = prime_divisors(12)  # Not a list nor a tuple yet!
assert next(divisors) == 2
# next(iterator) is sugar for iterator.__next__()
assert divisors.__next__() == 2
assert next(divisors) == 3
try:
    next(divisors)
    assert False  # unreachable!
except StopIteration:
    pass  # Hit iterator exhaustion

In practice, implementing this .__next__() method that returns an element or throws a StopIteration exception is sufficient for us to iterate over it (see below for an example).

Implementors of such poll-based API are then called iterators, and we see that our yield generator is thus an iterator too. Whether there is a distrinction between these two notions is mainly a matter of semantics and conventions; in Python and Rust, a generator is a compiler-generated iterator, written using yield statements within a "function".

  • there are more advanced question regarding generators in these two languages, such as the value returned after yielding (in Rust), and as to whether we can feed values into the generator when polling it (at which point we reach the realm of coroutines).

So, from this, there are two things that are nice to observe:

First, that we can manually implement the Iterator API by holding some captured state within our object / closure:

  • def prime_divisors (n: int) -> Iterator[int]:
        p: int = 2
        def next(self):
            nonlocal p, n  # State of the closure
            while n > 1:
                if n % p == 0:
                    n //= p
                    return p
                else:
                    p += 1
            raise StopIteration
        # return an anonymous / ad-hoc object with this __next__ method
        return type(__name__, (object,), { '__next__': next })()
    
  • Object-oriented style

    class prime_divisors(collections.Iterator):
        # State
        n: int
        p: int
    
        def __init__(self, n: int) -> None:
            # Constructor: initial state
            self.n = n
            self.p = 2
        
        # Yield an element:
        def __next__(self) -> int:
            while self.n > 1:
                if self.n % self.p == 0:
                    self.n //= self.p
                    return self.p
                else:
                    self.p += 1
            raise StopIteration
    

And second, that such API is indeed sufficient for iteration:

divisors = prime_divisors(12)
while True:
    try:
        d = next(divisors)
    except StopIteration:
        break
    # loop body
    print(d)
    # we could continue / break here...

is the unsugared version1 of:

for d in prime_divisors(12):
    # loop body
    print(d)
    # we could continue / break here...

1 Well, technically there is another layer, between Iterable and Iterator, that allows to have a function call to instance an iterator out of our good old collections...


I hope by now it is obvious that what an Iterator is in Python, in Rust (c.f., this thread), and in many other languages: it is actually the simplest polling-based API possible; and it turns out that it is enough for simple iteration, and thus, all the algorithms that can be based on it.

7 Likes

I couldn't agree more.

While that is typically the case, and absolutely true for FusedIterator, it isn't necessarily the case for Iterator. Here in the playground is an iterator, Delay which returns values after None,

fn main() {
    let mut foo = Delay::delay(vec![5], 2);
    assert_eq!(foo.next(), None);
    assert_eq!(foo.next(), None);
    assert_eq!(foo.next(), Some(5));
}

That's why it is not iterator, but generator. But it is simply dumped in the same bag of naming for convenience.

I guess the term "generator" in programming language lingo is usually reserved for functions which can "yield". This concept is similar to the Iterator interface, but you don't have to take care of storing intermediate state yourself between yield points. So it would be confusing to call something a generator which does not fit the usual programming language jargon, especially since Rust will probably get generators eventually (they are already implemented and used internally, just not user visible yet).

(Comparing to other languages is useless so staying away from those.)

I'm going to disagree, (which goes against @steffahn and others.) What your showing is conceptually broken.

It is not enough that the language (compiler) allows you to write it, idiomatic code requires more than a dumb compiler can know about.

From the doc for next "Returns None when iteration is finished." You break the concept of finished. In the same way writing code out=in.into().into().into().into() isn't the way to perform processing and misses the point of conversion trait.

1 Like

Well, that’s a weird way to cite the docs. It actually says:

Returns None when iteration is finished. Individual iterator implementations may choose to resume iteration, and so calling next() again may or may not eventually start returning Some(Item) again at some point.

1 Like

True in general, not for Iterators in particular though, as long as all you do is implement next(). Otherwise those extra conditions that make up an iterator would probably need to be stated in the docs for std::iter::from_fn, but nope, any function/closure will do!

I’m saying “as long as all you do is implement next()” because if you implement / re-implement any of the other (optional) iterator methods, there are in fact more things to pay attention to (in particular for size_hint, and most methods should get implementations that do the same, but perhaps more efficiently than the default implementation, but also for example step_by gives more options on how to implement it).

I would ultimately argue that the option of implementing these methods is not essential to understand what an Iterator is in Rust, though.

So it's a loop then?