Owning converting iterator

Consider the following code:

struct SplitConverter<T: std::str::FromStr> {
    value: String,
    next_index: usize,
    _v1: std::marker::PhantomData<T>,
}

impl<T: std::str::FromStr> SplitConverter<T> {
    pub fn new(value: String) -> Self {
        Self {
            value,
            next_index: 0,
            _v1: std::marker::PhantomData,
        }
    }
}

impl<T: std::str::FromStr> Iterator for SplitConverter<T> {
    type Item = std::result::Result<T, T::Err>;

    fn next(&mut self) -> std::option::Option<Self::Item> {
        fn consume_while<T: std::str::FromStr, F: Fn(char) -> bool>(
            iter: &mut SplitConverter<T>,
            f: F,
        ) {
            if iter.next_index >= iter.value.len() {
                return;
            }
            let substr = iter.value[iter.next_index..].char_indices();
            for (i, c) in substr {
                if !f(c) {
                    iter.next_index += i;
                    return;
                }
            }
            iter.next_index = iter.value.len();
        }

        consume_while(self, char::is_whitespace);
        let start = self.next_index;
        consume_while(self, |c| !char::is_whitespace(c));
        let end = self.next_index;
        let substr = &self.value[start..end];

        if start != end {
            Some(substr.parse())
        } else {
            None
        }
    }
}

fn main() {
    println!("Enter some numbers and finish with empty line:");
    let sum : i32 = std::io::stdin()
        .lock()
        .lines()
        .take_while(|l| l.as_ref().and_then(|s| Ok(s.len() > 0)).unwrap_or(false))
        .filter_map(Result::ok)
        .flat_map(SplitConverter::<i32>::new)
        .filter_map(|r| match r {
            Err(e) => {
                println!("Parse error: {}", e);
                None
            }
            Ok(val) => Some(val),
        })
        .sum();

    println!("Sum of numbers: {}", sum);
}

This code is tying to read, split, convert, and sum integers on multiple lines until an empty line is encountered. While writing this, I encountered the following problems:

  1. Could not use String::split_whitespace() as it would only borrow and not move/own the string to be usable in flat_map()
  2. Wanted to avoid collect() inside the flat_map() for efficiency
  3. Had to add PhantomData to the SplitConverter, as the compiler

Do you have any suggestions of how to make the code a little more appealing? :smiley: i.e achieve a similar main() without writing my own Iterator, use String::split_whitespace() or at least avoid the PhantomData... Thanks in advance!

You are using way to many absolute paths, e.g. std::marker::PhantomData or std::result::Result. Either they are already in the prelude (for Result and Option for example) or you can import them into your scope by using the use keyword. See chapter 7.4 of the book.


It would be great, if you would have provided some testcases, so we can see what your struct/program does. See chapter 11 of the book (you can also use them on the playground).

I think this function does exactly the same as your struct does. Let me know if I overlooked something:

fn split_text<T: FromStr>(s: &str) -> impl Iterator<Item = Result<T, T::Err>> + '_ {
    s.trim_start()
        .lines()
        .take_while(|line| !line.trim().is_empty())
        .map(|x| x.trim().parse())
}

This was my testcase:

#[test]
fn tc1() {
    let input = r#"
    3
    5
    899
    123
    
    "#;
    let first: Result<Vec<u32>, _> = SplitConverter::new(input.to_string()).collect();
    let second: Result<Vec<u32>, _> = split_text(input).collect();
    assert_eq!(first, second);
}

Thank you for your reply @hellow. My point was that to use the Iterator inside of a flat_map(), which I was not able to achieve with your function which uses a borrowed value. Try this test case:

#[test]
fn test_sth() {
    let lines = vec!["1 2".to_owned(), "3 0".to_owned()].into_iter();
    let sum: i32 = lines
        .flat_map(SplitConverter::<i32>::new)
        .filter_map(Result::ok)
        .sum();
    assert_eq!(sum, 6);
}```

Unfortunately, I get a Iterator<Type=Result<String,_>> from std::io::stdin().lock().lines(). My problem would be solved if I could convert that into Iterator<Type=Result<&str,_>>.

Not sure if this helps but if you change SplitConverter to hold a &'a str then you get your signature and you can just borrow the lines.

struct SplitConverter<'a, T: std::str::FromStr> {
    value: &'a str,
    next_index: usize,
    _v1: std::marker::PhantomData<T>,
}
...
// in order to leave the flat_map call the same
// SplitConverter::new is now
fn new<S: AsRef<str>>(value: &'a S) -> Self;

// and finally the test case you supplied
let lines = vec!["1 2".to_owned(), "3 0".to_owned()];
    // need to split it up because of lifetimes and
    // drop order when borrowing
    let sum: i32 = lines.iter()
        // SplitConverter now takes a &str 
        .flat_map(SplitConverter::<i32>::new)
        .filter_map(Result::ok)
        .sum();
    assert_eq!(sum, 6);

if you want to check it out in full I made a playground.

Thank you @DevinR528, but this also does not work when I consume directly from stdin().lock().lines().

fn main() {
    let sum: i32 = std::io::stdin().lock().lines()
        .filter_map(Result::ok)
        // SplitConverter now takes a &str 
        .flat_map(SplitConverter::<i32>::new)
        .filter_map(Result::ok)
        .sum();
    println!("{}", sum);
    assert_eq!(sum, 6);
}

Maybe I should rephrase my problem as:
I just want to borrow for one iteration, but the iterator is throwing values at me

There might be at least three solutions to this:

  1. A version of String::split() that takes ownership of the string instead of borrowing -> less generic library code
  2. A generic iterator adapter that "holds and borrows" the value to the actual borrowing iterator -> probably impossible to specify the lifetime of the borrowed value
  3. Convert code to loop to be able to specify lifetime of value exactly -> subjectively aesthetically less pleasing

The problem is that you have nothing to tie your return lifetime to

// this works fine when line: &'a str
fn parse_num<'a, T: 'a + Sized + FromStr>(line: String)
        -> impl Iterator<Item=T> + 'a 
    {
        line.split_whitespace().map(|item| item.parse::<T>()).flat_map(Result::ok)
    }

let sum: i32 = lines.into_iter()
    .flat_map(|line| parse_num::<i32>(line))
    .sum();

as you already said collecting it up and returning a Vec, not ideal although I wouldn't be surprised if this allocation is optimized out you could look through the IR? I'm clearly no expert on that.

Last option creating a struct. This took me two days of tinkering :sweat_smile: , it doesn't address what you would actually be using it for but it could be done. playground unfortunately you would have to use nightly

Whew I think I may have hit every requirement, I hope at least. This was a fun one to mess with. I was able to keep it generic but added a whole bunch of bounds to T, there is no collect, uses only stable and the type can be given at the call site.

struct Holder<'a, T>(&'a str, PhantomData<T>);

// now the important part
impl<'a, T> Iterator for Holder<'a, T> 
where
    T: AddAssign + Default + FromStr + PartialEq
{
    type Item = T;
    fn next(&mut self) -> Option<Self::Item> {
        let mut total = T::default();
        let mut iter = self.0.split_whitespace();
        
        while let Some(item) = iter.next() {
            total += item.parse::<T>().ok()?
        }
        if total == T::default() {
            None
        } else {
            Some(total)
        }
    }
}

Playground

The underlying issue is also related to BufRead::lines. This method allocates a String for each line. This is best possible for the Iterator API as it doesn't allow returning a reference to the internal buffer.
Perhaps streaming iterator API declared as next : fn(&mut self) -> &Self::Item allows both allocation-minimal lines and owned split_whitespace.

I suggest to do the processing imperatively rather than using iterator if you care about performance in this case my take.

1 Like

Well, it seems like streaming-iterator crate doesn't have required pieces so I ended up with a substantial implementation. Here is my solution https://gist.github.com/pcpthm/14575632103430988eb8507fc567db2b.
Excerpt from main:

fn main() {
    println!("Enter some numbers and finish with empty line:");

    let sum: i32 = std::io::stdin()
        .lock()
        .lines()
        .take_while(|l| l.as_ref().and_then(|s| Ok(s.len() > 0)).unwrap_or(false))
        .filter_map(Result::ok)
        .map(SplitWhitespace::new)
        .flatten_stream()
        .map_value(i32::from_str)
        .filter_map(|r| match r {
            Err(e) => {
                println!("Parse error: {}", e);
                None
            }
            Ok(val) => Some(val),
        })
        .sum();
    println!("Sum of numbers: {}", sum);
}
1 Like

Thank you for pointing me to the streaming_iterator crate - looks very much like the correct approach for this usecase.
Regarding split_whitespace() it would be indeed nice to have an owning version (e.g. into_split_whitespace()) in the std library - or some kind of adapter that owns the value and borrows it to the Iterator.