How to use a file iterator to read lines for use case?

Hello. Budding Rustacean here.

I have a file with numbers arranged in a matrix of size 5x5. Each such matrix has an empty line before it. The numbers in a row are separated by white-space. Every time I encounter an empty line, I want to read each of the next 5 rows one by one into a vector and do some processing on it.

Here's a sample of such a text file:


 7 42 22 92 60
 8 88 99 13 12
16 62 86 24 77
20 57 19 67 46
36 83 54 63 82

 7 86 50 78 16
83 45 67 94 58
21 98 99 85 43
71 19 31 22  4
70 51 34 11 61

My attempt at writing code for this runs into borrow issues - so rather than debugging my code, I think it's better if the rust community could suggest some idiomatic ways to accomplish this.

My point with this post is two fold:

  1. Solve the issue that I currently have.
  2. To see how people have dealt with custom input files and fed them into Rust data-structures in an idiomatic fashion. So if you've got some instances of some custom file reading and the code you used to solve it (I'm a newbie, so everything anything you post will be a good learning experience for me!), I'd love to read about it! (This is entirely optional, my main issue is point 1 above).

Can't wait to see all your suggestions!

I’ll have to assume a few things you don’t specify explicitly.

I’ll assume there can be an arbitrary number of such matrices.

I assume “processing” refers to something different from just “parsing” them. The parsing part might be easiest if it’s done separately from the further “processing”.

There’s some different approaches; you could kind-of “stream” the file, i.e. do the processing as you read it; or you could do a sequence of batch-processing steps, i.e. first read the whole file into memory, then parse the whole file into some data structure, then process everything.

I’ll assume the files are rather small, so putting everything into memory is feasible, which can be more straightforward than a “streaming” approach. I’ll also assume you want some form of 5x5-matrix-like structure, and I’ll assume the entries are positive nonnegative integers, so let’s try to achieve some [[u32; 5]; 5] 2-dimensional arrays.


First step, read the file. Using read_to_string in std::fs - Rust is simplest.

let data = fs::read_to_string("...path...");

On a String, you can split it into lines, using the lines method.

let lines = data.lines(); // this is an iterator of lines (type &str)

This will contain the empty lines, and the non-empty lines. Let’s ignore leading and trailing whitespace on each line, actually, let’s do that in the same line of code, so instead of the above,

let lines = data.lines().map(|line| line.trim());

Now we need to find starting points indicated by empty lines, and for each of them consider the next 5 lines. Or something like that… see… you’re not quite clear how lenient or strict we’re supposed to be about the file format; we might as well just expect the whole file to be of the form:

  • empty line
  • 5 lines of 5 numbers each
  • end of file or repeat the above

and error (for now, via panic) if anything doesn’t match. Actually that’s probably easier. If this doesn’t match your use-case, you might want to specify what else is allowed; e.g. additional empty lines, additional kinds of content without preceding empty lines, or whatever. Well in that case, we could just take chunks of 6 lines and make sure everything looks as expected. Let’s use the itertools crate for this.

let six_lines_each = lines.chunks(6);

Then, we can process the chunks and collect there resulting matrices in a Vec. Proper error handling would probably be nicer if this was a “production ready” piece of software, but panics for when anything goes wrong are probably easier.


So preparing a place to store the results

let mut matrices: Vec<[[u32; 5]; 5]> = Vec::new();

then iterate over the chunks

for chunk in &six_lines_each {
     // ...
}

Now, for each chunk, we expect, initially, an empty line

assert!(chunk.next().unwrap().is_empty());

Then 5 lines of contents, with 5 space-separated numbers. One way to do this is using zip. Prepare the array we want to write the result to, then zip with an iterator of lines/words, and do the writing for each entry. To also validate there are exactly more 5 lines available and 5 words per line, using zip_eq from itertools might be a good idea. (It’s like zip, but also validates the lengths match.)

So prepare the resulting matrix:

let mut matrix = [[0_u32; 5]; 5];

Then zip_eq for the lines

for (line, row) in chunk.zip_eq(&mut matrix) {
    // ..
}

Here, the line is a &str, and the row is a &mut [u32; 5] where the line’s contents shall be written to.
Another zip_eq for the numbers, well, as soon as we have them split by whitespace

for (number, entry) in line.split_whitespace().zip_eq(row) {

}

so we have a number: &str that still needs parsing, and an entry: &mut u32 where the result shall be written.

Finally,

*entry = number.parse().unwrap();

to parse the string into an u32, and store the result.

And after the inner loops, save the matrix

matrices.push(matrix);

In the end you’ll have a Vec<[[u32; 5]; 5]> that you can process further however you like :wink:

→ Look here to see the whole code in the playground.

2 Likes

I'm not sure why you are running into borrowing issues, since reading from a file itself can't really be done purely using borrowed strings – you almost always need to allocate memory dynamically (String, Vec, etc.) in order to read contents of arbitrary form (and length) from a file.

If you take the easy path and use BufReader for chopping up the file into lines, you can do this (Playground):

let reader = BufReader::new(the_file);
let mut matrices = Vec::new();
let mut matrix = Vec::new();

for line in reader.lines() {
    let row = line?
        .split_whitespace()
        .map(str::parse::<u32>)
        .collect::<Result<Vec<_>, _>>()?;
        
    if row.is_empty() {
        // new matrix
        matrices.push(matrix);
        matrix = Vec::new();
    } else {
        matrix.push(row);
    }
}

// handle last matrix potentially lacking trailing empty line
if !matrix.is_empty() {
    matrices.push(matrix);
}
1 Like

Here's a version which

  • is a data structure
  • reads on demand
  • reuses buffers
  • uses custom std::io::Error errors

It's simultaneously pretty strict about your input parameters (e.g. a single blank line must proceed each 5 lines of numbers) and sparse on error details (e.g. no line numbers).

Feel free to ask any questions.

1 Like

Hi @steffahn, thanks for the writeup. Really appreciate it. I am running into a slight hiccup when I run this however:

error[E0599]: the method `lines` exists for enum `Result<String, std::io::Error>`, but its trait bounds were not satisfied
   --> src/lib.rs:7:22
    |
7   |     let lines = data.lines().map(|line| line.trim());
    |                      ^^^^^ method cannot be called on `Result<String, std::io::Error>` due to unsatisfied trait bounds
    |
   ::: /Users/adi/.rustup/toolchains/nightly-aarch64-apple-darwin/lib/rustlib/src/rust/library/core/src/result.rs:503:1
    |
503 | pub enum Result<T, E> {
    | --------------------- doesn't satisfy `Result<String, std::io::Error>: BufRead`
    |
    = note: the following trait bounds were not satisfied:
            `Result<String, std::io::Error>: BufRead`
            which is required by `&mut Result<String, std::io::Error>: BufRead`

And this is the code I used (hopefully I haven't missed out any of the steps you had outlined):

let six_lines_each = lines.chunks(6);

let mut matrices: Vec<[[u32; 5]; 5]> = Vec::new();
for chunk in &six_lines_each {
    assert!(chunk.next().unwrap().is_empty());
    let mut matrix = [[0_u32; 5]; 5];
    for (line, row) in chunk.zip_eq(&mut matrix) {
        for (number, entry) in line.split_whitespace().zip_eq(row) {
            *entry = number.parse().unwrap();
        }
    }
    println!("Matrix: {:?}\n", matrix);
    matrices.push(matrix);
}

I suspect it might have something to do with unwrapping or parsing the read_to_string() part but I'm not sure.

UPDATE: After making a slight code change according to @H2CO3 suggestion about extracting the Ok value, it was smooth sailing after that.

You just have to look at the error message and/or the signature of read_to_string() to see that it returns a Result (of course it does – because what should it return if reading the file fails?). As usual, you have to handle the error somehow in order to get the Ok value (the string itself) out of it.

1 Like

Thanks for this @H2CO3 ! In case you are curious, this was the borrow check error I was getting previously:

warning: unused variable: `file_it`
  --> src/lib.rs:25:29
   |
25 | pub fn part_two<R: BufRead>(file_it: R) -> i32 {
   |                             ^^^^^^^ help: if this is intentional, prefix it with an underscore: `_file_it`
   |
   = note: `#[warn(unused_variables)]` on by default

error[E0382]: borrow of moved value: `file_it`
    --> src/lib.rs:44:17
     |
33   | pub fn part_one<R: BufRead>(mut file_it: R) -> i32 {
     |                             ----------- move occurs because `file_it` has type `R`, which does not implement the `Copy` trait
...
40   |     for line in file_it.lines() {
     |                         ------- `file_it` moved due to this method call
...
44   |                 file_it.read_line(&mut buf).expect("Could not read line!");
     |                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^ value borrowed here after move
     |
note: this function takes ownership of the receiver `self`, which moves `file_it`
    --> /Users/adi/.rustup/toolchains/nightly-aarch64-apple-darwin/lib/rustlib/src/rust/library/std/src/io/mod.rs:2288:14
     |
2288 |     fn lines(self) -> Lines<Self>
     |              ^^^^
help: consider further restricting this bound
     |
33   | pub fn part_one<R: BufRead + Copy>(mut file_it: R) -> i32 {
     |                            ++++++

For more information about this error, try `rustc --explain E0382`.

What also confuses me in the above error is how the compiler warns me that file_it is an unused variable even though I do use it, as seen in the following code (my attempt at solving my problem as described in this forum topic):

let mut buf: String = String::new();
// need to process the first line separately
file_it.read_line(&mut buf).expect("Could not read first line!");
let draws: Vec<i32> = buf.split(",").map(|num| num.trim().parse::<i32>().unwrap()).collect();
println!("{:?}", draws);
  
let mut boards: Vec<HashMap<i32, (usize, usize)>> = vec![];
for line in file_it.lines() {
    if line.unwrap().is_empty() {
        boards.push(HashMap::new());
        for row in 0..5 {
            file_it.read_line(&mut buf).expect("Could not read line!");
            let curr_row: Vec<i32> = buf
                                    .split_whitespace()
                                    .map(|num| num.trim().parse::<i32>().unwrap())
                                    .collect();
            for (col, num) in curr_row.iter().enumerate() {
                boards
                    .last_mut()
                    .unwrap()
                    .insert(*num, (row, col));
            }
        }
    }
}

The unused variable warning points to a different function (part_two()), but it seems your use of file_it is in the function part_one(), isn't it? Therefore those are two different variables.

As for the borrow error: the compiler tells you exactly what the problem is – BufRead::lines() consumes the reader by-value, so you can't use it again (which is what you are trying to do by calling read_line() on it again).

That loop wouldn't work correctly even if it compiled, either: what should the additional read_line() method do while you are already simultaneously iterating over the lines of the file? That seems like a race condition, a logic error – the iterator itself gives you back consecutive lines, so you shouldn't read them again.

1 Like

Oh yeah, now I see that the warning is pointing to the variable in part_two(). How silly of me.

The borrow error also makes sense now that you've explained it. Thanks!

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.