Early breaking of iterator

Hi,
often I want to do some operations on a collection, but want to interupt on first Error or on first occurence of some condition and return it as e.g. here:

fn example(s: &str) -> bool {
  let sum = s.chars()
    .map(|c| if is_valid(c) { c } else { return false })
    .map(|c| some_function(c))
    .filter(|c| another_function(c))
    .sum();
  .....
}

or in cases as here

fn example(s: &str) -> bool {
  let sum = if let Some(v) = s.chars()
    .check(|c| is_valid(c))
    .map(|c| some_function(c))
    .filter(|c| another_function(c))
    .sum() {
    v
  } else {
    return false
  };
  .....
}

One could obviously and easily implement each case with loops instead of iterators, but I'd like to know if there is an idiomatic way to implement both cases using an iterator.

Iterator::take_while

1 Like

You could use itertools::process_results.

Something like:

fn example(s: &str) -> bool {
  let sum = if let Some(v) = process_results(s.chars()
    .map(|c| is_valid(&c).then(|| c).ok_or(())), |iter| iter
    .map(|c| some_function(c))
    .filter(|c| another_function(c))
    .sum()).ok() {
    v
  } else {
    return false
  };
  .....
}

In case this doesn’t compile, give me a complete code example with the missing definitions added and I can debug it.


The let sum = if let Some(v) = … { v } else { return … }; thing will eventually become syntactically nicer at some point in the future when the recently accepted RFC 3137 gets implemented and stabilized.

In the meantime, I would probably re-use the same variable name though, so let sum = if let Some(sum) = … { sum } else { return … };

1 Like

Thanks, but the problem here, is that it takes all elements until the condition arrises and lets them further evaluate. The problem I have is, that I want to check the whole collection for some condition and if the condition is satisfied, do calculation on it.
Hence one could first s.chars().any(|c| is_valid(c)) and then do the calculations, but then one has to go through the iterator at least twice.

Another approach looks like

fn example(s: &str) -> bool {
  let sum = if let Some(v) = s.chars()
    .map(|c| is_valid(&c).then(|| c))
    .map(|c| c.map(some_function))
    .filter(|c| c.as_ref().map_or(true, another_function))
    .sum() {
    v
  } else {
    return false
  };
  .....
}

and it uses the Sum impl for Option. The map and filter where somewhat changed in order to be aware of the Option-typed iterator items. In particular, note that the filter must make sure to retain any None values.

1 Like

Okay, that looks very useful and very powerful. Thank you.

Wouldn't a function like check(), as in my second example make a lot of sense for rusts std-library for iterators. Because that way, it would allow easy use of Option, Result and more in iterators in a clean and performant way.

I don’t think that a function like the check you provide can work. What type signature would it have? Note how process_results needs to use a closure and pass a new iterator, etc… I do suppose the situation could be improved by having something like process_results but supporting Option (sparing the conversion), and maybe you could also have a convenience function that uses a boolean predicate so it fits the use case. Its use might look like

fn example(s: &str) -> bool {
  let sum = if let Some(v) = s.chars()
    .check(|c| is_valid(c), |iter| iter
    .map(|c| some_function(c))
    .filter(|c| another_function(c))
    .sum()) {
    v
  } else {
    return false
  };
  .....
}

(although in both the code above and my first example the formatting is a bit suggestive and more proper formatting would probably add some extra indentation).

1 Like

By the way, note that the Sum<Option<…>> … implementation uses an std-internal process_results, too!

(accum.rs - source)

impl<T, U> Sum<Option<U>> for Option<T>
where
    T: Sum<U>,
{
    fn sum<I>(iter: I) -> Option<T>
    where
        I: Iterator<Item = Option<U>>,
    {
        iter.map(|x| x.ok_or(())).sum::<Result<_, _>>().ok()
    }
}

^^^ here you see the use of ok_or(()) and .ok() just like in my example, and it uses Sum for Result, and that does:

(accum.rs - source)

impl<T, U, E> Sum<Result<U, E>> for Result<T, E>
where
    T: Sum<U>,
{
    fn sum<I>(iter: I) -> Result<T, E>
    where
        I: Iterator<Item = Result<U, E>>,
    {
        iter::process_results(iter, |i| i.sum())
    }
}

where iter::process_results is a private function in std similar (or maybe the same?) to/as the itertools one.

1 Like

Yes that makes sense, but for instance
iter.check(|c| is_valid(c)) returning a Option<Iterator>

One could then easily rewrite the examples, or isn't this possible?
Thank you very much for the detailed help!

Ah, I see. No, the problem is that iterators are lazy. Directly returning an Option<Iterator> suggests that the is_valid check must be done eagerly, right in the moment when .check is called. What to do with all the items then? They would need to be stored in a Vec or something, which means additional allocation. If you want to do it like this, you sure can using .collect (which supports Result or Option item types as well).

Something like

iter.map(|c| is_valid(&c).then(|| c)).collect::<Option<Vec<_>>>()

But avoiding this extra allocation and interleaving the validity check with further processing can often be better. In particular if the iterator is cheap to re-produce, this kind of solution is way worse than your original approach of just-iterating-twice.

1 Like

To contrast the above approach, the “lazy” way in which process_results(iterator_of_results, |iter| { … }) works is: it calls the closure with an iterator iter that either

  • contains all the (unwrapped) items if no errors are returned, in that case the (Ok-wrapped) result of calling the “|iter| { … }” closure will be the result of process_results
  • contains all the (unwrapped) items up (and not including) to the first error, then no more items. The result of the work that the closure has done so far is then simply ignored, the error is returned.

The above only applies if the closure accesses all of the items from the iterator iter. If it only looks at, e.g., the first three items and those don’t contain any errors, and then returns, were still in a success case even though the iterator might have contained some Error value

1 Like

Okay I see, then I think process_results is the most beautiful option possible. Thanks again.

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.