Flat_map on an iterator of Results

I'm trying to create an iterator of words in a file. The inline version of this is

    let file = File::open(filename)?;
    let buf_reader = io::BufReader::new(file);

    for line in buf_reader.lines() {
        for word in line?.split_whitespace() {
            // do something with the word
        }
    }

I want to convert this into an iterator. In Python, I'd just use yield at the point where I say "do something", but that's not available in Rust. I tried to implement an iterator myself, but the state management made my brain explode.

This is almost perfect for flat_map: buf_reader.lines().flat_map(|l| l.split_whitespace()). The problem is that any individual line is actually a Result<String>, so I have to unwrap it to split on whitespace.

I've been trying various combinations of map_or_else, or_else, etc, for hours, but everything I try gives me a type error. What I want is something that yields Result<String>, where getting an error from lines() gives that error from next(), but getting a line from lines() gives the words in the line.

And yes, I can write code that just does things inline. That's what I have at the moment. And in many ways there's no real benefit to be had from changing the code. But for me, there's a huge benefit in better understanding what I can and can't do with Rust iterators.

So I guess what I'm asking is, what is an idiomatic way of writing an iterator in Rust that reads a file word by word, in a way that lets the caller catch and handle errors? Basically like the lines() method of BufReader, but for words...

You can do something like this:

fn by_words<R: BufRead>(mut reader: R) -> impl Iterator<Item = IoResult<String>> {
    reader.lines().flat_map(|line| match line {
        Err(error) => vec![Err(error)],
        Ok(line) => line.split_whitespace().map(move |s| Ok(s.to_owned())).to_vec(),
    })
}

Hmm, I tried something similar to that and got a bunch of type errors. Maybe because I'm trying to add anyhow::Result into the mix, or maybe because I'm trying to avoid things like to_vec because I feel like it adds unnecessary allocations (does it?). Basically I'm trying to be "too clever" before I've got the basics clear :slightly_smiling_face:

Thanks, though, I'll do some playing with this version and see if the logic "clicks" with me.

This is about my 3rd time trying to get started with Rust. I can get stuff working, but it doesn't ever feel like I've really understood the idioms with Rust, and I end up just adding fixes the compiler suggests, blindly, not really understanding what's going on. I can get things to work, but it still feels like more luck than judgement. Ah well, I'll get there in the end :slight_smile:

1 Like

This is a very common way offramp from learning rust; you're not alone.

I'd strongly suggest that you just not worry about it for a while. Just add a few .to_owned()s in a few places, or collect into Vec<_>s if iterators are being awkward. It's fine.

You can always come back later, once things are at least working, and improve any hotspots. But one of the magical things about Rust is that it's pretty damn fast even without doing heroic efforts. (With a reasonable allocator, copying a reasonable-size string just isn't that big a deal.)

One other thing you might be hitting: everyone would love to have an impl Iterator<Item = Result<T, E>>Result<impl Iterator<Item = T>, E>. But it's impossible to do that without allocation: if it's lazy it can't look into the future to see that there will be an error, and if it's non-allocating it has nowhere to store all the Ts it might need to return if there's no E. So you might just be in a place where trying iterator adapters is never going to work as perfectly as you'd like.

Rust doesn't have lending iterators to date, and an Iterator can't hand out references to its internal buffers (part of the current line, say). That's why Lines hands out a freshly allocated String each time.

I feel this is a good tutorial type problem but don't have time to write one up just now; I may return to it later.

It does add allocations for the Err case, although there isn't – I believe – an allocation-free version, due to splitting on a local variable (you have to collect the words, otherwise it doesn't compile). At that point, the single-element vec![], which usually isn't hit, isn't much of a concern.

Cool. So the .to_vec() on the result of split_whitespace().map(...) doesn't allocate? I wouldn't have expected that...

Yeah, I can see that. What I had intended to do was to retrun all the items until I hit an error, and then return the error (and then go back to returning items, if the error is transient). I'm not entirely sure it's a good API design, but it's hard to be 100% sure when I'm not clear what is a sane thing to do when reading a file and getting an IO error on (say) line 100.

I can see why people recommend "just use unwrap() until you get more experienced"...

I'm not sure which you want, but RBE has a bunch of different ways to handle iterators of Results, in case one of them inspires you in some way: Iterating over Results - Rust By Example

It does allocate. (That's why I was writing that there isn't an allocation-free version of this code as far as I understand.) And exactly because it does, the single-element vec![] isn't much of a concern, comparatively.

process_results from the itertools crate may be relevant.

Oh, right. Sorry, I understand what you meant now.

Nice, thanks. I'll take a look.

Nitpick: probably not unwrap, but expect (or even unwrap_or_else(|| panic!())), to give an error some description other then just the file-line-column triple.

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.