How to compose Iterators?

Hi folks

I was playing with iterators I and I'm wondering if what would be an elegant way of composing them. Just to illustrate, I have an iterator that will yield words from strings, namely Input. Then I have a Iterator that will yield lines from a buffer, std::io::Lines. Now I want to construct a third iterator InputFile that yield words from lines.

What I did: I saved an input: Option<Input> inside InputFile and if is not none I yield from it. Otherwise I read a line, set self.input and yield first word.

The code is very fuzzy in my opinion, there should be a way to express, take Lines and compose with Input in a more clear way

#![allow(dead_code)]
#![allow(unused_imports)]

use std::fs;
use std::io::{self, BufRead, Cursor};
use std::iter::Iterator;
use std::path;
fn main() {}

#[derive(Debug)]
struct Input {
    input: String,
    offset: usize,
}

impl Input {
    fn new(input: String) -> Input {
        Input { input, offset: 0 }
    }
}

trait CountPredicate {
    fn count_predicate<P>(&mut self, predicate: P) -> usize
    where
        P: Fn(char) -> bool;
}

impl<'a, I> CountPredicate for I
where
    I: std::iter::Iterator<Item = char>,
{
    fn count_predicate<P>(&mut self, predicate: P) -> usize
    where
        P: Fn(char) -> bool,
    {
        let mut walked = 0;
        while let Some(val) = self.next() {
            if predicate(val) {
                walked += 1;
            } else {
                break;
            }
        }
        return walked;
    }
}

mod utils {
    pub fn is_whitespace(x: char) -> bool {
        x.is_whitespace()
    }

    pub fn is_not_whitespace(x: char) -> bool {
        !x.is_whitespace()
    }
}

/// Yields words for Input (a String wrapper)
impl Iterator for Input {
    type Item = String;
    fn next(self: &mut Input) -> Option<String> {
        let s = &self.input[self.offset..];
        let len = s.chars().count();
        if len == 0 {
            return None;
        }
        let it = &mut s.chars();

        let left = it.count_predicate(utils::is_whitespace);
        let right = match it.count_predicate(utils::is_not_whitespace) {
            0 => left,
            x => left + x + 1,
        };

        self.offset += right;
        if right > left {
            return Some(String::from(&s[left..right]));
        }

        None
    }
}

struct InputFile<T> {
    reader: io::BufReader<T>,
    input: Option<Input>,
}

impl<'a> InputFile<fs::File> {
    fn open(path: &str) -> io::Result<InputFile<fs::File>> {
        Ok(InputFile {
            reader: io::BufReader::new(fs::File::open(path)?),
            input: None,
        })
    }

    fn from_string(s: String) -> InputFile<Cursor<String>> {
        InputFile {
            reader: io::BufReader::new(Cursor::new(s)),
            input: None,
        }
    }
}

/// Yields words for an input file
impl<'a, T> Iterator for InputFile<T>
where
    T: BufRead,
{
    type Item = String;
    fn next(&mut self) -> Option<String> {
        if let Some(Some(x)) = self.input.as_mut().map(|x| x.next()) {
            return Some(x);
        }
        let mut line = String::new();
        if let Ok(readed) = self.reader.read_line(&mut line) {
            if readed == 0 {
                return None;
            }

            let mut input = Input::new(line);
            let res = input.next();
            match res {
                None => return None,
                Some(_) => {
                    self.input = Some(input);
                    return res;
                }
            }
        }

        None
    }
}

mod test {
    use super::*;

    #[test]
    fn test_count_predicate() {
        let ws = |x: char| x.is_whitespace();
        let nws = |x: char| !x.is_whitespace();
        assert_eq!(3, "foo ".chars().count_predicate(nws));
        assert_eq!(3, "foo".chars().count_predicate(nws));
        assert_eq!(0, " foo".chars().count_predicate(nws));
        assert_eq!(3, "   foo".chars().count_predicate(ws));
        assert_eq!(1, " bar".chars().count_predicate(ws));
    }

    #[test]
    fn test_input_iter() {
        //assert_eq!(
        //    vec!["foo", "bar"],
        //    Input::new("foo bar".into()).collect::<Vec<String>>()
        //);
        assert_eq!(
            vec!["foo", "bar", "tar", "zar"],
            Input::new("  foo bar     tar  zar ".into()).collect::<Vec<String>>()
        );
    }

    #[test]
    fn test_input_file() -> io::Result<()> {
        assert_eq!(
            vec!["foo", "bar", "tar", "zar"],
            InputFile::from_string("foo bar\ntar zar".into()).collect::<Vec<String>>()
        );

        Ok(())
    }
}

(Playground)

It sounds like you are reimplementing flat_map.

1 Like

Hi Alice, thank you for pointing this out, in fact I could replace lots of code with

        assert_eq!(
            vec!["foo", "bar", "tar", "zar"],
            io::BufReader::new(Cursor::new("foo bar\ntar zar".to_string()))
                .lines()
                .flat_map(|x| Input::new(x.unwrap()))
                .collect::<Vec<String>>()
        );

LoL, I will refactor my code

Thanks!

1 Like

Expanding the first question, how can I use flat_map to combine two iterators?

I have this

    fn lines_to_words<'a, I>(lines: I) -> I
    where
        I: Iterator<Item = &'a str>,
    {
        lines.flat_map(|line| line.split(" "))
    }

Obviously this doesn't work

   Compiling myml v0.1.0 (/Users/gecko/code/myml)
error[E0308]: mismatched types
   --> src/main.rs:199:9
    |
195 |     fn lines_to_words<'a, I>(lines: I) -> I
    |                           -               - expected `I` because of return type
    |                           |
    |                           this type parameter
...
199 |         lines.flat_map(|line| line.split(" "))
    |         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ expected type parameter `I`, found struct `std::iter::FlatMap`
    |
    = note: expected type parameter `I`
                       found struct `std::iter::FlatMap<I, std::str::Split<'_, &str>, [closure@src/main.rs:199:24: 199:46]>`
    = help: type parameters must be constrained to match other types
    = note: for more information, visit https://doc.rust-lang.org/book/ch10-02-traits.html#traits-as-parameters

error: aborting due to previous error

For more information about this error, try `rustc --explain E0308`.
error: could not compile `myml`.

To learn more, run the command again with --verbose.

The ideia here is the same as before. I want to receive a Iterator from strings (lines) and return an Iterator from strings that yield words in each line. I've been struggling to the FlatMap from lines.flat_map(|x| x.split(" ")) converted to simple Iterator<&'a str>, but I have no idea how to do it ... :frowning:

I think this is a generics problem, not an iterators problem. Your code:

    fn lines_to_words<'a, I>(lines: I) -> I
    where
        I: Iterator<Item = &'a str>,

What this says is "the caller may choose any type I as long as that type implements Iterator<Item = &'a str>. This function takes and returns a value of type I." - but how could you get a value of type I that works for any possible choice of I? There's no way to implement that except to return lines. That's not what you want though; you want to be able to return a different concrete type than the type of lines, even if the two types implement the same trait.

Basically, the point I'm getting at here is that generic types are chosen by the caller, but it doesn't make sense for this function to let the caller to choose the return type.

Instead, what you want is this:

    fn lines_to_words<'a, I>(lines: I) -> impl Iterator<Item = &'a str>
    where
        I: Iterator<Item = &'a str>,

It seems a bit redundant to have to write the trait bound twice, but that's the way to do it. This says "the caller may choose a type I that implements Iterator<Item = &'a str> and pass it as a parameter. The function will return some unspecified type that implements Iterator<Item = &'a str>, with the concrete type determined by the implementation of the function." That gives you the freedom to return the FlatMap iterator.

There are some functions (eg. Iterator::collect()) where the caller can choose the return type. This also sometimes confuses people.

1 Like

Thanks @bheisler

I'm still struggling with the type system. I will try to read more more stuff to see I can create some intuition on trait implementations. Thanks again your answer it worked like a charm. I tried to refactor it to a trait and get bitten by the compiler again. I fell like I'm missing some foundations.

Best regards

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.