Cannot flat_map split_ascii_whitespace

I've got a file containing a huge amount of integers spread over many lines. The integers are grouped in records. Each record starts with the number of values belonging to the record followed by values. (The line breaks do not have any meaning.)
The example below shows 2 records (1st has 5 values [1,2,3,4,5], 2nd has 1 value [1]). For easy parsing the values into records I want to create an Iterator<Item=i64>. But I always get errors from the compiler. E.g.

fn get_input_numbers() -> impl Iterator<Item=i64>  {
    let lines = std::io::Cursor::new("5 1 2 3\n4 5 1 1".to_string());
    BufReader::new(lines).lines().map(|l| l.unwrap())
        .flat_map(|l| l.split_ascii_whitespace()).map(|n| n.parse::<i64>().unwrap())
                      ^
cannot return reference to function parameter `l`
returns a reference to data owned by the current function rustc(E0515)
}

What is the ideomatic rust code to convert such data into an Iterator<Item=i64>?

split methods return an iterator whose lifetime is tied to the original string (it'll yield slices into that string). Since l here is a String and you have ownership of it, it'll be dropped at the end of that closure. The easy way out is to collect into a Vec<String>.

1 Like

The alternative is to create your own data structure to hold on to the reader and some other buffer, e.g. a String holding the current line, and then iteratively parse and refill the buffer. Probably returning an Item = Result<i64, io::Error>.


What do you want your iterator to return from the example input?

# Everything
5 1 2 3 4 5 1 1
# Non-counts
1 2 3 4 5 1

If the latter, what should this input return?

3 1 2

@erelde because the file contains a huge amount of values I do not want to read the entire file into a String or Vec< String >
@quinedot the iterator should return everything. I'll try to implement your idea

to hold on to the reader and some other buffer, e.g. a String holding the current line

thanks for your hints

It won't, you just need to hold on to the current line. You need to keep that memory around.

1 Like

That's evident, and it's not what @erelde suggested, either. The solution would be:

fn get_input_numbers() -> impl Iterator<Item=i64>  {
    let lines = Cursor::new("5 1 2 3\n4 5 1 1".to_string());
    
    BufReader::new(lines)
        .lines()
        .flat_map(|l| {
            l.unwrap().split_ascii_whitespace().map(str::to_string).collect::<Vec<_>>()
        })
        .map(|n| n.parse::<i64>().unwrap())
}

This only allocates memory for a single line.

Yep. Though now that I see it written I think it should be a single flat_map step to avoid the temporary small strings.

1 Like

You can also consider moving the parse inside the flat_map:

    BufReader::new(lines)
        .lines()
        .flat_map(|l| {
          l.unwrap().split_ascii_whitespace().map(|w| {
            w.parse::<i64>.unwrap()
          })
        })
1 Like

For example.

2 Likes

@quinedot thanks for the example
@riking I copied your suggestion verbatim to my function (added the missing parentheses after parse::<i64> and got the same error from the compiler "cannot return value referencing temporary value"

1 Like

@H2CO3 thanks for the code, it compiles!
@erelde sorry that I interpreted your answer in the wrong way (my fault)

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.