The heart of the matter here is:
- Everything in C++'s
algorithm
performs internal iteration. That is to say, they all have the for
loop inside the function.
- Rust's Iterator API tries to provide a lot of good tools for external iteration. The
for
loop is intended to exist either in the caller, or in one of very few internal iteration methods. (basically, fold
)
The ironic part is, C++ has external iterators (its concept of an iterator
is an external iterator!). But for some reason it gives them no love!
There's a number of advantages to external iteration:
1. They compose well.
Let's say you want to know the total cost of all events that happen on a Tuesday:
events.iter()
.filter(|event| event.is_on_day(Weekday::Tuesday))
.map(|event| event.cost)
.sum()
This runs in a single for loop without having to allocate vectors for intermediate results (or worse, having to name all the temporary variables!).
In C++, every container type needs to come with its own reverse iterator. In Rust, you can just do .rev()
, oftentimes even after doing other things!
// turn [A, B, C, D, E]
// into [(4, A), (3, B), (2, C), (1, D), (0, E)]
things.rev().enumerate().rev()
2. They can be infinite/don't need to fit in memory
C++'s std::iota
"fills a range with successive increments of the starting value."
Our equivalent of that in Rust is simply 0..
. You could implement enumerate
in terms of this:
fn enumerate(iter: impl Iterator<Item=T>) -> impl Iterator<Item=(usize, T)> {
(0..).zip(iter)
}
3. They are lazy. (think short-circuiting!)
strs.map(|s| s.parse::<i64>())
.collect::<Result<Vec<_>, _>>()
This attempts to parse each string in an iterator, stopping immediately after the first error. Of course, C++ can do that with exceptions, but exceptions bring many problems of their own; this shows how we don't need them in Rust.
4. After optimization, they're just as fast as external iteration.
Usually.
Sometimes when writing your own Iterator type, you may need to override fold
(basically the fundamental "internal iteration" method) in order to help the compiler see work that can be lifted out of the loop.
So what are the downsides? Well:
- They require more compile time.
- You can end up with some crazy-looking or difficult-to-name types.
P.S. here's one of algorithm
's golden nuggets of design that I ran into while trying to write this post:
-
find - finds the first occurrence of an element
-
find_end - finds the last occurrence of a subsequence
-
search - finds the first occurrence of a subsequence