Chaining Iterators over Results

I think some version of this issue may perhaps be one of the most frequently asked questions about rust. Yet, I have failed to find a good answer.

I’m trying to solve a simple problem: Read lines from a text file, parse each line as an int, and compute the sum of all parsed ints. Ignoring boilerplate, I can do this using:

let sum: i32 = f.lines().map(|x| x.unwrap().parse::<i32>().unwrap()).sum();

My question is, instead of ending up with an unwrap panic on IO or parse errors, is there any way I could do something along the lines of:

let sum: Result<i32, Box<dyn Error>> = f.lines().map(|x| x.parse::<i32>()).sum()

I want the iteration to stop on the first IO or parse error and return the error that occurred (As a Result) in sum.

I have learned that this can be partially solved by using collect() to convert a Vec of Result into a Result of Vec, but I assume that means storing the whole vector of Strings or i32s in memory. For the sake of this argument, imagine that the text file I’m processing is way larger than available RAM.

You can do almost exactly this, since Result implements Sum as follows:

impl<T, U, E> Sum<Result<U, E>> for Result<T, E>
    where T: Sum<U>

Your map needs to do something to convert the errors to your boxed error type though, first for the lines error and then for the parse error. The Try ?-operator is perfect for this, something like:

let sum: Result<i32, Box<dyn Error>>
    = f.lines().map(|x| Ok(x?.parse::<i32>()?)).sum();
3 Likes

Thanks, I hadn’t thought about using the ‘?’ operator inside a statement like you did with “x?.parse”, and also wasn’t aware that Result implements Sum. I’ve tried your suggestion and it does solve my particular problem pretty well.

What if Result didn’t implement Sum?

I’ve also tried extending the example by adding a lambda that adds 1 to each int before the sum op. The best solution I’ve come up with is:

    let sum: Result<i32, Box<dyn Error>> = f
        .lines()
        .map(|x| Ok(x?.parse::<i32>()?))
        .map(|x: Result<i32, Box<dyn Error>>| Ok(x? + 1))
        .sum();

I could not make it work without explicitly stating the type of the parameter ‘x’ in the second lambda. I guess for me, the readability of the iterator solution above is starting to become worse than for an equivalent solution based on a traditional for loop. Any ideas on how to make the iterator version less verbose?

Also, in an iterator chain like above, is there a way to “escape” from wrapping everything in Results once the operations performed can no longer fail (i.e after the parse step in my case)?

The difficult thing is that every ? may change the error type. You can use the map function on Result in the second map to make that one not change the error type, e.g.

let sum: Result<i32, Box<dyn Error>> = f
    .lines()
    .map(|x| Ok(x?.parse::<i32>()?))
    .map(|res| res.map(|x| x+1))
    .sum();

If Result didn’t implement Sum, you wouldn’t be able to do the short-circuiting without a for loop or something.

1 Like

Without Sum, you could do a short-circuiting accumulation with try_fold instead.

1 Like

Thanks @alice, using map on Result makes the iterator version nicer…

I’m probably biased from decades of procedural/OO programming, but I still think I find the regular for loop version more readable though:

    let mut sum = 0;
    for line in f.lines() {
        let x = line?.parse::<i32>()?;
        sum += x + 1;
    }

I just wish there was a way to avoid having to declare the sum variable mutable, but not sure how that would make sense…

@cuviper, I’ve tried to implement the sum operation using try_fold but I haven’t managed to make it work… would you mind posting how you would do it?

let sum: Result<i32, Box<dyn Error>> = f
    .lines()
    .try_fold(0, |acc, line| Ok(acc + line?.parse::<i32>()?));

or spelled out a bit more:

let sum: Result<i32, Box<dyn Error>> = f
    .lines()
    .try_fold(0, |acc, line_result| {
        let line = line_result?;
        let x = line.parse::<i32>()?;
        Ok(acc + x)
    });
2 Likes

Great, thanks!

Btw, I tried and failed doing it this way:

let sum: Result<i32, Box<dyn Error>> = f
        .lines()
        .map(|x| Ok(x?.parse::<i32>()?))
        .map(|res| res.map(|x| x + 1))
        .try_fold(0i32, |a, x| Ok(a + x?));

Out of curiosity, do you experienced rust programmers find the for loop version or the iterator version more readable / preferable?

I suspect that most experienced Rustaceans prefer the iterator version because, in general, the compiler can elide more bounds checks with the iterator version, and thus generate smaller, faster safe code.

   |
   |         .map(|res| res.map(|x| x + 1))
   |               ^^^ consider giving this closure parameter a type

Right, so like @alice mentioned before, each ? can change the error type using From. So rust knows that your first map will return some error type, and the try_fold will return some (potentially different) error type. Type inference for the latter is constrained by the type of let sum: ..., but it doesn’t know what to use for the intermediate errors from the first map.

You can specify the intermediate type in (at least) two ways:

    let sum: Result<i32, Box<dyn Error>> = f
        .lines()
        .map(|x| -> Result<i32, Box<dyn Error>> { Ok(x?.parse::<i32>()?) })
        .map(|res| res.map(|x| x + 1))
        .try_fold(0i32, |a, x| Ok(a + x?));
    let sum: Result<i32, Box<dyn Error>> = f
        .lines()
        .map(|x| Ok(x?.parse::<i32>()?))
        .map(|res: Result<i32, Box<dyn Error>>| res.map(|x| x + 1))
        .try_fold(0i32, |a, x| Ok(a + x?));

Or you can avoid changing the error type in try_fold, so the type is passed directly through.

    let sum: Result<i32, Box<dyn Error>> = f
        .lines()
        .map(|x| Ok(x?.parse::<i32>()?))
        .map(|res| res.map(|x| x + 1))
        .try_fold(0i32, |a, res| res.map(|x| a + x));

I prefer iterators to a point, but I probably wouldn’t split out the maps as far as you did, but rather use a unified fold or try_fold expression. A for loop just hammers on Iterator::next(), but folding can often use tighter inner loops. Note that for_each and try_for_each are also implemented with folds, so they carry the same advantage.

Plus, the iterator style is more suited toward switching to parallel iterators later with rayon.
(Disclaimer, I am one of rayon’s authors. :slight_smile:)

That’s true compared to a for loop that’s manually indexing, but you can also get that kind of benefit just by using the for loop directly on the collection iterator.

1 Like

Thanks @cuviper for the clarification. Your version using try_fold is starting to look quite nice and readable to me. Let's say I want to write a function that takes a BufRead, parses all the lines as numbers and sums them up... like:

fn sum_up_file_fold(reader: impl BufRead) -> Result<i32, Box<dyn Error>> {
    reader
        .lines()
        .try_fold(0, |acc, line| Ok(acc + line?.parse::<i32>()?))
}

Is using a Box<dyn Error> as error type the way to go here, and in general when dealing with multiple possible errors? Or is it generally better to be more explicit about the error by creating some custom error type that can contain both the io and parse errors?

While using a Box<dyn Error> allows you to put any error inside, it is difficult to unpack the error if you need to provide special handling for some errors, instead of just failing and printing it to the user.

If you're building a library, you should not use Box<dyn Error> since it prevents users of the library from properly handling errors, but if you're writing some binary that will just print the error to the user, using a Box<dyn Error> is fine.

Thanks Alice, that makes sense.