struct SplitConverter<T: std::str::FromStr> {
value: String,
next_index: usize,
_v1: std::marker::PhantomData<T>,
}
impl<T: std::str::FromStr> SplitConverter<T> {
pub fn new(value: String) -> Self {
Self {
value,
next_index: 0,
_v1: std::marker::PhantomData,
}
}
}
impl<T: std::str::FromStr> Iterator for SplitConverter<T> {
type Item = std::result::Result<T, T::Err>;
fn next(&mut self) -> std::option::Option<Self::Item> {
fn consume_while<T: std::str::FromStr, F: Fn(char) -> bool>(
iter: &mut SplitConverter<T>,
f: F,
) {
if iter.next_index >= iter.value.len() {
return;
}
let substr = iter.value[iter.next_index..].char_indices();
for (i, c) in substr {
if !f(c) {
iter.next_index += i;
return;
}
}
iter.next_index = iter.value.len();
}
consume_while(self, char::is_whitespace);
let start = self.next_index;
consume_while(self, |c| !char::is_whitespace(c));
let end = self.next_index;
let substr = &self.value[start..end];
if start != end {
Some(substr.parse())
} else {
None
}
}
}
fn main() {
println!("Enter some numbers and finish with empty line:");
let sum : i32 = std::io::stdin()
.lock()
.lines()
.take_while(|l| l.as_ref().and_then(|s| Ok(s.len() > 0)).unwrap_or(false))
.filter_map(Result::ok)
.flat_map(SplitConverter::<i32>::new)
.filter_map(|r| match r {
Err(e) => {
println!("Parse error: {}", e);
None
}
Ok(val) => Some(val),
})
.sum();
println!("Sum of numbers: {}", sum);
}
This code is tying to read, split, convert, and sum integers on multiple lines until an empty line is encountered. While writing this, I encountered the following problems:
Could not use String::split_whitespace() as it would only borrow and not move/own the string to be usable in flat_map()
Wanted to avoid collect() inside the flat_map() for efficiency
Had to add PhantomData to the SplitConverter, as the compiler
Do you have any suggestions of how to make the code a little more appealing? i.e achieve a similar main() without writing my own Iterator, use String::split_whitespace() or at least avoid the PhantomData... Thanks in advance!
You are using way to many absolute paths, e.g. std::marker::PhantomData or std::result::Result. Either they are already in the prelude (for Result and Option for example) or you can import them into your scope by using the use keyword. See chapter 7.4 of the book.
It would be great, if you would have provided some testcases, so we can see what your struct/program does. See chapter 11 of the book (you can also use them on the playground).
Thank you for your reply @hellow. My point was that to use the Iterator inside of a flat_map(), which I was not able to achieve with your function which uses a borrowed value. Try this test case:
Unfortunately, I get a Iterator<Type=Result<String,_>> from std::io::stdin().lock().lines(). My problem would be solved if I could convert that into Iterator<Type=Result<&str,_>>.
Not sure if this helps but if you change SplitConverter to hold a &'a str then you get your signature and you can just borrow the lines.
struct SplitConverter<'a, T: std::str::FromStr> {
value: &'a str,
next_index: usize,
_v1: std::marker::PhantomData<T>,
}
...
// in order to leave the flat_map call the same
// SplitConverter::new is now
fn new<S: AsRef<str>>(value: &'a S) -> Self;
// and finally the test case you supplied
let lines = vec!["1 2".to_owned(), "3 0".to_owned()];
// need to split it up because of lifetimes and
// drop order when borrowing
let sum: i32 = lines.iter()
// SplitConverter now takes a &str
.flat_map(SplitConverter::<i32>::new)
.filter_map(Result::ok)
.sum();
assert_eq!(sum, 6);
if you want to check it out in full I made a playground.
A version of String::split() that takes ownership of the string instead of borrowing -> less generic library code
A generic iterator adapter that "holds and borrows" the value to the actual borrowing iterator -> probably impossible to specify the lifetime of the borrowed value
Convert code to loop to be able to specify lifetime of value exactly -> subjectively aesthetically less pleasing
The problem is that you have nothing to tie your return lifetime to
// this works fine when line: &'a str
fn parse_num<'a, T: 'a + Sized + FromStr>(line: String)
-> impl Iterator<Item=T> + 'a
{
line.split_whitespace().map(|item| item.parse::<T>()).flat_map(Result::ok)
}
let sum: i32 = lines.into_iter()
.flat_map(|line| parse_num::<i32>(line))
.sum();
as you already said collecting it up and returning a Vec, not ideal although I wouldn't be surprised if this allocation is optimized out you could look through the IR? I'm clearly no expert on that.
Last option creating a struct. This took me two days of tinkering , it doesn't address what you would actually be using it for but it could be done. playground unfortunately you would have to use nightly
Whew I think I may have hit every requirement, I hope at least. This was a fun one to mess with. I was able to keep it generic but added a whole bunch of bounds to T, there is no collect, uses only stable and the type can be given at the call site.
struct Holder<'a, T>(&'a str, PhantomData<T>);
// now the important part
impl<'a, T> Iterator for Holder<'a, T>
where
T: AddAssign + Default + FromStr + PartialEq
{
type Item = T;
fn next(&mut self) -> Option<Self::Item> {
let mut total = T::default();
let mut iter = self.0.split_whitespace();
while let Some(item) = iter.next() {
total += item.parse::<T>().ok()?
}
if total == T::default() {
None
} else {
Some(total)
}
}
}
The underlying issue is also related to BufRead::lines. This method allocates a String for each line. This is best possible for the Iterator API as it doesn't allow returning a reference to the internal buffer.
Perhaps streaming iterator API declared as next : fn(&mut self) -> &Self::Item allows both allocation-minimal lines and owned split_whitespace.
I suggest to do the processing imperatively rather than using iterator if you care about performance in this case my take.
Thank you for pointing me to the streaming_iterator crate - looks very much like the correct approach for this usecase.
Regarding split_whitespace() it would be indeed nice to have an owning version (e.g. into_split_whitespace()) in the std library - or some kind of adapter that owns the value and borrows it to the Iterator.