Chaining Iterators over Results

landersson · July 16, 2019, 9:34pm

I think some version of this issue may perhaps be one of the most frequently asked questions about rust. Yet, I have failed to find a good answer.

I'm trying to solve a simple problem: Read lines from a text file, parse each line as an int, and compute the sum of all parsed ints. Ignoring boilerplate, I can do this using:

let sum: i32 = f.lines().map(|x| x.unwrap().parse::<i32>().unwrap()).sum();

My question is, instead of ending up with an unwrap panic on IO or parse errors, is there any way I could do something along the lines of:

let sum: Result<i32, Box<dyn Error>> = f.lines().map(|x| x.parse::<i32>()).sum()

I want the iteration to stop on the first IO or parse error and return the error that occurred (As a Result) in sum.

I have learned that this can be partially solved by using collect() to convert a Vec of Result into a Result of Vec, but I assume that means storing the whole vector of Strings or i32s in memory. For the sake of this argument, imagine that the text file I'm processing is way larger than available RAM.

cuviper · July 16, 2019, 10:12pm

landersson:

My question is, instead of ending up with an unwrap panic on IO or parse errors, is there any way I could do something along the lines of:
let sum: Result<i32, Box<dyn Error>> = f.lines().map(|x| x.parse::<i32>()).sum()

You can do almost exactly this, since Result implements Sum as follows:

impl<T, U, E> Sum<Result<U, E>> for Result<T, E>
    where T: Sum<U>

Your map needs to do something to convert the errors to your boxed error type though, first for the lines error and then for the parse error. The Try ?-operator is perfect for this, something like:

let sum: Result<i32, Box<dyn Error>>
    = f.lines().map(|x| Ok(x?.parse::<i32>()?)).sum();

landersson · July 17, 2019, 8:02am

Thanks, I hadn't thought about using the '?' operator inside a statement like you did with "x?.parse", and also wasn't aware that Result implements Sum. I've tried your suggestion and it does solve my particular problem pretty well.

What if Result didn't implement Sum?

I've also tried extending the example by adding a lambda that adds 1 to each int before the sum op. The best solution I've come up with is:

    let sum: Result<i32, Box<dyn Error>> = f
        .lines()
        .map(|x| Ok(x?.parse::<i32>()?))
        .map(|x: Result<i32, Box<dyn Error>>| Ok(x? + 1))
        .sum();

I could not make it work without explicitly stating the type of the parameter 'x' in the second lambda. I guess for me, the readability of the iterator solution above is starting to become worse than for an equivalent solution based on a traditional for loop. Any ideas on how to make the iterator version less verbose?

Also, in an iterator chain like above, is there a way to "escape" from wrapping everything in Results once the operations performed can no longer fail (i.e after the parse step in my case)?

alice · July 17, 2019, 8:30am

The difficult thing is that every ? may change the error type. You can use the map function on Result in the second map to make that one not change the error type, e.g.

let sum: Result<i32, Box<dyn Error>> = f
    .lines()
    .map(|x| Ok(x?.parse::<i32>()?))
    .map(|res| res.map(|x| x+1))
    .sum();

If Result didn't implement Sum, you wouldn't be able to do the short-circuiting without a for loop or something.

cuviper · July 17, 2019, 2:58pm

Without Sum, you could do a short-circuiting accumulation with try_fold instead.

landersson · July 18, 2019, 9:36pm

Thanks @alice, using map on Result makes the iterator version nicer...

I'm probably biased from decades of procedural/OO programming, but I still think I find the regular for loop version more readable though:

    let mut sum = 0;
    for line in f.lines() {
        let x = line?.parse::<i32>()?;
        sum += x + 1;
    }

I just wish there was a way to avoid having to declare the sum variable mutable, but not sure how that would make sense...

landersson · July 18, 2019, 9:40pm

@cuviper, I've tried to implement the sum operation using try_fold but I haven't managed to make it work... would you mind posting how you would do it?

cuviper · July 18, 2019, 9:57pm

let sum: Result<i32, Box<dyn Error>> = f
    .lines()
    .try_fold(0, |acc, line| Ok(acc + line?.parse::<i32>()?));

or spelled out a bit more:

let sum: Result<i32, Box<dyn Error>> = f
    .lines()
    .try_fold(0, |acc, line_result| {
        let line = line_result?;
        let x = line.parse::<i32>()?;
        Ok(acc + x)
    });

landersson · July 18, 2019, 10:02pm

Great, thanks!

Btw, I tried and failed doing it this way:

let sum: Result<i32, Box<dyn Error>> = f
        .lines()
        .map(|x| Ok(x?.parse::<i32>()?))
        .map(|res| res.map(|x| x + 1))
        .try_fold(0i32, |a, x| Ok(a + x?));

landersson · July 18, 2019, 10:03pm

Out of curiosity, do you experienced rust programmers find the for loop version or the iterator version more readable / preferable?

TomP · July 18, 2019, 10:17pm

I suspect that most experienced Rustaceans prefer the iterator version because, in general, the compiler can elide more bounds checks with the iterator version, and thus generate smaller, faster safe code.

cuviper · July 18, 2019, 10:53pm

landersson:

Btw, I tried and failed doing it this way:

let sum: Result<i32, Box<dyn Error>> = f
        .lines()
        .map(|x| Ok(x?.parse::<i32>()?))
        .map(|res| res.map(|x| x + 1))
        .try_fold(0i32, |a, x| Ok(a + x?));

   |
   |         .map(|res| res.map(|x| x + 1))
   |               ^^^ consider giving this closure parameter a type

Right, so like @alice mentioned before, each ? can change the error type using From. So rust knows that your first map will return some error type, and the try_fold will return some (potentially different) error type. Type inference for the latter is constrained by the type of let sum: ..., but it doesn't know what to use for the intermediate errors from the first map.

You can specify the intermediate type in (at least) two ways:

    let sum: Result<i32, Box<dyn Error>> = f
        .lines()
        .map(|x| -> Result<i32, Box<dyn Error>> { Ok(x?.parse::<i32>()?) })
        .map(|res| res.map(|x| x + 1))
        .try_fold(0i32, |a, x| Ok(a + x?));

    let sum: Result<i32, Box<dyn Error>> = f
        .lines()
        .map(|x| Ok(x?.parse::<i32>()?))
        .map(|res: Result<i32, Box<dyn Error>>| res.map(|x| x + 1))
        .try_fold(0i32, |a, x| Ok(a + x?));

Or you can avoid changing the error type in try_fold, so the type is passed directly through.

    let sum: Result<i32, Box<dyn Error>> = f
        .lines()
        .map(|x| Ok(x?.parse::<i32>()?))
        .map(|res| res.map(|x| x + 1))
        .try_fold(0i32, |a, res| res.map(|x| a + x));

I prefer iterators to a point, but I probably wouldn't split out the maps as far as you did, but rather use a unified fold or try_fold expression. A for loop just hammers on Iterator::next(), but folding can often use tighter inner loops. Note that for_each and try_for_each are also implemented with folds, so they carry the same advantage.

Plus, the iterator style is more suited toward switching to parallel iterators later with rayon.
(Disclaimer, I am one of rayon's authors. )

That's true compared to a for loop that's manually indexing, but you can also get that kind of benefit just by using the for loop directly on the collection iterator.

landersson · July 19, 2019, 9:18am

Thanks @cuviper for the clarification. Your version using try_fold is starting to look quite nice and readable to me. Let's say I want to write a function that takes a BufRead, parses all the lines as numbers and sums them up... like:

fn sum_up_file_fold(reader: impl BufRead) -> Result<i32, Box<dyn Error>> {
    reader
        .lines()
        .try_fold(0, |acc, line| Ok(acc + line?.parse::<i32>()?))
}

Is using a Box<dyn Error> as error type the way to go here, and in general when dealing with multiple possible errors? Or is it generally better to be more explicit about the error by creating some custom error type that can contain both the io and parse errors?

alice · July 19, 2019, 12:41pm

While using a Box<dyn Error> allows you to put any error inside, it is difficult to unpack the error if you need to provide special handling for some errors, instead of just failing and printing it to the user.

If you're building a library, you should not use Box<dyn Error> since it prevents users of the library from properly handling errors, but if you're writing some binary that will just print the error to the user, using a Box<dyn Error> is fine.

landersson · July 29, 2019, 1:31pm

Thanks Alice, that makes sense.

system · October 27, 2019, 1:39pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Converting a for loop with error handling into iterator help	12	1186	November 23, 2020
Iterating over Results help	6	619	January 12, 2023
Extract result from Iterator code review	11	1059	April 24, 2021
Error handling and iterator, map, collect help	10	3423	January 12, 2023
?-like early return on iterator error help	11	2546	July 25, 2019

Chaining Iterators over Results

Related Topics