Handling errors from iterators

isagalaev · August 19, 2015, 9:45pm

Hello everyone!

I work with iterators that can signal errors and it proves to be quite laborious:

I'm using Result as Iterator::Item
Since .next() returns an Option, I can't use the standard try macro inside the function, so I implemented my own variant that rewraps errors in Some(Err(..)) instead.
The most common case for an iterator is to stop upon discovering an error. This usually doesn't just happen naturally, so I had to implement a wrapper around an iterator that remembers the fact of getting an Err value and produces Nones from then on. Now I have to be careful to always use wrapped iterators as the raw ones just keep producing errors indefinitely.

I'm sure this is not how it should be and I have a strong feeling that there must have been a big design discussion about it somewhere but I couldn't find any… Any pointers?

gkoz · August 19, 2015, 10:19pm

Can you expand on this? Why does the consumer's error handling code not stop after an error?

isagalaev · August 19, 2015, 10:28pm

My first consumer was a test that simply called .last() on an iterator and it never happened. Put more generally, there can be useful consumers (like adaptors) that don't inspect values deeper than Some/None distinction.

gkoz · August 20, 2015, 11:26am

Iterator adaptors are wrappers too and there doesn't seem to be a way around remembering to apply a particular adaptor.

This use case seems to call for either an inclusive variant of take_while or take_until

let a = [1, 2, 3, 4, 5];
let mut it = a.iter().take_until(|&a| *a >= 2);
assert_eq!(it.next(), Some(&1));
assert_eq!(it.next(), Some(&2));
assert!(it.next().is_none());

bluss · August 20, 2015, 11:30am

Iterator adaptors sort of break down if you need short-circuiting on errors. It totally makes sense to have an adaptor that stops after the first error.

I wonder, how often is this kind of thing sufficient though?

for result in iterator {
    let elt = try!(result);
    // rest of the loop
}

kstep · August 21, 2015, 12:01am

If you really need to stop iterator on error and make it iterator's logic, stop it by returning None. If you need to pass error out of iterator, return Option<Result<T, E>> as you do now. Result implements FromIterator trait for collections types, so you can do let xs = iter.collect::<Result<Vec<_>, _>>(); and get either first error returned by the iterator or vector of ok items (the same works for Option). If you really need to get the last returned value and stop of first error, use iter.take_while(Result::is_ok).last().map(Result::unwrap).

isagalaev · August 25, 2015, 5:50am

It took me some time to process all the feedback, but I'm back.

Stopping iterators

I sort of understand the point that a little adaptor boilerplate can solve the problem. But it's still boilerplate and having to remember to do things just to make everything work as expected is not optimal, we can do better So instead of answering the question about how often a for-loop with a try! would be enough I'd rather ask it differently: is it ever useful to have an iterator that doesn't stop on first error? My perception is biased, so I can't really answer that myself.

The take_while() suggestion (thanks @kstep!) is very nice, but it has another downside (apart from having to call it at all): it drops the last non-Ok value so it's not possible to inspect the error.

Example

Here's an example illustrating why I'm asking all these question and what's bothering me. I'm writing a streaming JSON parser that's supposed to work on the fly without loading the whole document in memory or constructing an entire data structure. It's designed as an iterator, so it would be nice to use adaptors where possible. Here I'm trying to simply count numbers in an array, and the document has some garbage in it:

[1, 2, 3, #^%$, 4]

Let's try to parse it using take_while(Result::is_ok) as a guard:

let mut parser = parser::Parser::new(data);
let count = parser.by_ref().take_while(Result::is_ok).count();
if let Some(Err(e)) = parser.next() {
    println!("There was an error: {}", e)
}

This looks much more involving than it should:

Since take_while() takes ownership of the reference, I have to remember to use it with by_ref()
I have to use mut because I'm calling next() manually to determine if there was an error

And ultimately, it doesn't work anyway as take_while() drops the error and there's no way to get it.

Proposal

I feel that this sort of problems are not unique to my library so instead of just fixing my own iterators I'm investigating if it makes sense to change how Rust works with iterators. Namely:

Use an iterator-specific enum type instead of Option: enum IterResult<T, E> {Value(T), Err(E), Stop} with the Err-value officially considered to last (so for-loop would behave as expected)
Have an iterator-specific itry! macro
Modify all consuming adaptors to return Result values

Does it make sense?

(A meta note: is it a good forum to discuss these things or should I file a proper RFC?)

kstep · August 25, 2015, 7:26am

Could you please state what behavior do you exactly want from an iterator in regards to error handling?
In your example it's usually makes sense to just call .collect::<Result<Vec<_>, _>>() on the iterator, as you usually want all OK elements anyway, and the count() example feels a little artifical.

kstep · August 25, 2015, 7:29am

Also see Itertools::fold_results() from itertools crate.

gkoz · August 25, 2015, 10:15am

It seems that ReadDir can yield Ok after an Err.

birkenfeld · August 25, 2015, 10:38am

Sure, you might want to handle that error and then continue with the iteration.

A good example is an iterator where the items are the result of some I/O action: listing files like @gkoz said, accepting connections on a socket (see Incoming in std::net - Rust), parsing requests from a client, etc.

isagalaev · August 25, 2015, 5:55pm

The example is artificial yes, sorry! The whole point of my library is parsing potentially huge JSONs on the fly, without constructing the entire result in memory. count() is only a trivial example of processing that doesn't require keeping the whole result the whole time. For a more practical application imagine a process that goes over a JSON data from and API and does SQL INSERTs into a database.

isagalaev · August 25, 2015, 5:56pm

@gkoz, @birkenfeld thanks for the examples guys, I'll keep thinking then

kstep · August 26, 2015, 12:27pm

If your case you can easily do it with simple for in loop, no need for iterator combinators. Or just use map() to build insert queries from json objects.

In other words, I think you are overcomplicating matters, I see no need for iterator API changes, nor need for any new constructions to do what you want.

isagalaev · August 26, 2015, 8:15pm

With the reasoning that everything can be done with simple loops and if statements we'd never have any convenience methods at all My intention was to use my own perceived difficulty at implementing faulty iterators as an example and find out if there were something that can be improved on the language/stdlib level.

I do concede my point that Rust needs a harder treatment for errors, based on the examples in this thread. However there are still two things left that I feel can be improved:

try! isn't usable inside next() — and I don't know what a good fix here might be for the general case. For now I'm content with my local variety of it.
There's no obvious iterator adaptor implementing the behaviour I'm after: yield an error and then stop. All suggested ways have slight warts on them, like take_while would swallow the error and fold_results and collect require O(n) storage… However! I won't press this point any further as I can't really present a good convincing example, even for myself. Probably this use-case is indeed an edge case.

gkoz · August 26, 2015, 8:37pm

It makes sense to open an issue on the rfcs repo, the lack of take_until seems like an oversight.

kstep · August 27, 2015, 10:21am

You are a little wrong here, fold_results() doesn't require O(n) storage, you can fold with iter.fold_results(None, |_, item| Some(item)) and get Result<Option<T>, E>, which requires O(1) storage at worst.

I think Itertools::fold_results() is the thing you are really looking for.

kstep · August 27, 2015, 10:25am

The take_until doesn't solve the problem, as it's really isomorph to take_while(|x| !f(x)), and it would eat last element, which stopped iteration by given condition (read "Err(e)" item in this case). The real problem with take_while() is it eats up the last element it evaluated predicate for.

bluss · August 27, 2015, 10:28am

I think itry!() sounds neat. I haven't run into this because in similar situations I've just moved the logic out of the iterator's next function, and I used regular try!() instead.

(Itertools has take_while_ref that "gives back" the failing element, but it only works for simple iterators -- for example when you scan a string.)

gkoz · August 27, 2015, 10:31am

[quote="kstep, post:18, topic:2551"]
The take_until doesn't solve the problem, as it's really isomorph to take_while(|x| !f(x)), and it would eat last element
[/quote]Well sure, I meant one that would not eat the last element. If until implies otherwise to you maybe it's not a good name.

Topic		Replies	Views
Fallible iterators? help	12	2886	January 12, 2023
?-like early return on iterator error help	11	2984	July 25, 2019
Stopping a filtered Iterator help	4	434	January 12, 2023
Extract result from Iterator code review	11	1749	April 24, 2021
Result, iterators and early exits help	3	862	September 29, 2019

Handling errors from iterators

Stopping iterators

Example

Proposal

Related topics