When lazy iterators don't iterate

There were several places in my (small) program where I was using iterators to go through things. However, I found that the iterators were iterating.
In one case, I was using an iterator to search for a name in a vactor, and extract the position and whether I found the name. As a for loop, easy. As an iterator (with the same body, just in a .map(|arg| ...), the iterator never called the map.
In the second case, I was using the iterator to:

    short.iter().map(|ls| {
            if !hm.contains_key(&ls.label) {
                hm.insert(ls.label.clone(), ls.strings.clone());
            } else {
                let mut v1 = hm.get(&ls.label).unwrap().clone();
                v1.extend(ls.strings.clone());
                hm.insert(ls.label.clone(), v1);
            }
    }

Where short is a vector of LabeledStrings (a structure containing a label (string) and a vector of strings). As a for loop, it does exactly what I want. As a .map, it does nothing. Okay. I will use the for loop. But I am clearly misunderstanding something about using iterators. Is the problem that I am copying everything and therefore not consuming the iterator?
Thanks for any explanations,
Joel

The problem is that map produces a new iterator that is itself meant to be used in some way. In your case that’s an iterator of () items, since the closure doesn’t return any value, but it’s an iterator nontheless.

If you just want to run a closure for each element of the iterator and then return nothing, simply use for_each instead of map :wink:

8 Likes

Thank you. for_each would do exactly what I was looking for in these cases.
I will say that the runtime failure from using .map is rather "interesting" to diagnose. But now I know.
Yours,
Joel

It should usually generate a warning though, if you don’t use an iterator produced by map.

By the way, you can probably simplify the code to something along the lines of

short.iter().for_each(|ls| {
    hm.entry(ls.label.clone())
        .or_default()
        .extend(ls.strings.iter().cloned());
});
2 Likes

I think most Rust programmers would expect this to read

for ls in short { ... }

Yes, you could use for_each, but it usually doesn't read as well.

I must have done something particularly awkward, as there wa sno compiler warning about the unused map-produced iterator. Ah well. The for loop works. And for cases where I want to use the iterator, I can use for_each now.
Thanks for all the replies,
Joel

Just fyi (if you don't already know) for a in b {/* do stuff */} in rust is fully based on iterators, it is roughly equivalent to[1]

let mut iter = b.into_iter();
while let Some(a) = iter.next() {
    // do stuff
}

meaning that even if b is an iterator and not a collection it is usually preferable to just write the for ... in b {} loop instead of using b.for_each(|...| {}). Only case I could see myself using the Iterator::for_each method directly is if I have a pre-defined function that I want to apply to each item of the iterator, like in .for_each(my_function).

So even if you are using the for .. in ... {} syntax, you are still using iterators :slight_smile:

As an aside:

As a convention, types / collections that implement the .iter() method also implement the IntoIterator trait for a reference to themself, like impl IntoIterator for &'_ MyCollection { } which, per convention, is equivalent to calling the .iter() method. Meaning, that the following are usually equivalent:

  1. my_collection.iter().map(...)... and (&my_collection).into_iter().map(...)...
  2. and, by extension also in for loops: for a in my_collection.iter() { ... } and for a in &my_collection { ... }.

Since the for ... in ... { ... } syntax "implicitly" calls .into_iter() for you, so the first one is actually my_collection.iter().into_iter(), which is just the iterator returned by .iter()[2].


  1. See the rust reference ↩︎

  2. Per this implementation in the standard library. ↩︎

Note that for many iterators, for_each is actually not fully equivalent to a simple loop with calls to next. Instead, for_each is more restrictive in how it can be used, and in turn can sometimes thus be better optimized.

Things you cannot do with for_each:

  • stop iteration early! With next, you can just stop calling next early, and decide not to consume any remaining elements (meaning, since iterators are lazy, this also may skip significant work - and also side-effects - of creating those elements in the first place)
    • the iterator API does offer the try_for_each API, for a higher-order function/combinator that does support aborting the iteration early
    • for_each can be more efficient than try_for_each or manual use of next for iterators that do complex logic with case distinctions, I’ll mention some more concrete examples later
    • of course, for_each can be stopped… via panic (if unwinding is even enabled in your built), but since for_each also consumes the iterator by-value, the more precise characterization may be that for_each cannot be used to resumably stop iteration early â‚Ť^. .^₎⟆
  • do extra things, move the iterator elsewhere, etc, in between consumption of individual items! With next, you can store the iterator somewhere, move it around and/or move around your control flow to other parts of the program, or maybe even other threads. Neither for_each nor try_for_each support this.
    • A good indicator of this difference is that try_for_each cannot do actions that are async, whereas within the body of a for loop, of course, you can use async/await without any issues.
    • The extra power that both for_each and try_for_each thus have is that they can make use of extra stack space during the ongoing iteration. For instance, you can implement some recursive logic of generating iterator items as a truly recursive function. Only when the iteration ends early (and hence only for try_for_each, not for_each) you’ll have to convert this extra state into data that is stored within the Iterator object itself, but try_for_each only has to do it once per call to try_for_each, whenever the iteration ends (or ends early).
    • Using next instead means such an iterator will need to persist all state information inside of the Iterator object directly on every single call to next, i.e. for every single item; and moreover, every call to next would start with some logic that needs to read this state out of the iterator in order to jump back to the right branch in the iterator’s logic/control flow.

Of course, try_for_each can be used to do everything that next can do, if you pass it a closure that always “errors” unconditionally. (You could default-implement next in terms of try_for_each.) But one probably wouldn’t be as surprised by the fact that that may result in inefficientcies as by the same kind of fact for the seemingly innocuous “next” method.

The Iterator trait’s documentation may talk a bit more of try_fold than try_for_each, since the former is an even more generalized method that offers state being passed around by-value, not just accessed by mutable reference, but the above discussion applies equally to both.


Affected iterators are usually those that do different kinds of iteration subsequences in sequence. The typical example is a foo.chain(bar) chain of iterators. The resulting Chain iterator will contain a bool flag (or more specifically, it’s wrapping iterators into Options) to track which of the 2 iterators it’s currently in. But for_each or try_for_each will never have to re-check this flag, whereas a loop of next calls will re-check for every single item. Another infamous case is iterating over right-closed ranges i..=j; iterating over these is less efficient than iterating over something like i..(j+1) instead (of course the latter only works if j+1 doesn’t overflow), but a majority of this inefficiency may actually go away if for_each is used.

7 Likes

Thank you both for the detailed additional explanations.
Yours,
Joel

i..=j and i..(j+1) Look the same to me. What makes it produce different code? I thought that the difference would maybe only be using a single comparison instruction, but still a single instruction at each iteration.

The key difference in functionality is that 0..=u8::MAX is an iterator that produces u8::MAX, but 0..(u8::MAX + 1) is not. The ability to go right up to the last representable value of the type is what requires the extra code.

2 Likes