Read a file line by line


#1

Hi all,

I am currently learning rust by reading the official book.

The task is quite simple: Read a file line by line and print each line on the screen. However, I played some code samples only to find myself fighting with the compiler. The error message I got is not quite helpful to me. It would be great if anyone could point me the error that I made. Here is the code I’m working with:

use std::io::BufReader;
use std::io::BufRead;
use std::fs::File;
use std::path::Path;
                
fn main() {   
    let f = try!(File::open("input.tsv"));
    let mut file = BufReader::new(&f);
    for line in file.lines() {
        let l = line.unwrap();
        println!("{}", l); 
    }           
}   

The error message I got is:

<std macros>:5:8: 6:42 error: mismatched types:
 expected `()`,
    found `core::result::Result<_, _>`
(expected (),
    found enum `core::result::Result`) [E0308]

I can see it’s a type error. But I do not see any function/macro which expects () as its input. The line numbers in the error message is not quite helpful. I believe it must be some stupid mistake that I made.


#2

The try! macro expects the function you’re in to return a Result<T, E> value. If the expression is an Err, the macro will perform an early return.

You can’t return a value from main() in Rust, so you’ll either have to write a function that delegates the result to the caller by returning Result<T, std::io::Error>, or handle it at the call-site.

use std::io;

fn read_file() -> Result<(), io::Error> {
    let f = try!(File::open("input.tsv"));
    // yadda yadda...
    Ok(())
}
let f = match File::open("input.tsv") {
    Ok(file) => file,
    Err(e) => {
        // fallback in case of failure.
        // you could log the error, panic, or do anything else.
        println("{}", e);
        open_another_file()
    }
};

If you’re absolutely sure that the file will open without issues (such as with integration testing), you can unwrap() the result. The program will panic in case the assertion fails.

let f = File::open("input.tsv").unwrap();

#3

To put it more simple, this code:

fn main() {   
    let f = try!(File::open("input.tsv"));
}

expands to this:

fn main() {
   let f = match File::open("input.tsv") {
       Ok(v) => v,
       Err(e) => return Err(From::from(e))
   };
}

As you can see, your try!()s try to return Result<T, E>, but main() returns nothing (or returns () from Rust’s point of view), hence the errors.

The compile errors, looking like this:

<std macros>:5:8: 6:42 error: mismatched types:
 expected `()`,
    found `core::result::Result<_, _>`
(expected (),
    found enum `core::result::Result`) [E0308]
<std macros>:5 return $ crate:: result:: Result:: Err (
<std macros>:6 $ crate:: convert:: From:: from ( err ) ) } } )
<std macros>:1:1: 6:48 note: in expansion of try!
<anon>:3:13: 3:42 note: expansion site
error: aborting due to previous error
playpen: application terminated with error code 101

give you a hint with <std macros> and in expansion of try! phrases, which means error comes from try! macro.
The sentence with words expansion site point to concrete place in your code where you use offending macro.


#4

Thank you, @kstep and @nukep !


#5

I have a similar task. read line by line, and do a bit processing.

    use std::io::BufReader;
    use std::io::BufRead;
    use std::fs::File;
                    
    fn main() {   
        let f = File::open("data/SRR062634.filt.fastq").unwrap();
        let file = BufReader::new(&f);
        for (num, line) in file.lines().enumerate() {
            let l = line.unwrap();
            if  num % 4 == 0 {
            let chars: String = l.chars().skip(1).collect(); 
            println!(">{}", chars);
            }
            if  num % 4 == 1  {
                println!("{}", l); 
             }
        }           
    }

What it does is for every 4 lines, keep the first line, but change the ‘@’ to ‘>’ at the beginning. keep the second line.

it is easy to write, very slow compared to other language (such as python).

Anyone an idea how to improve? Thanks.


#6

Did you compile with cargo build --release?


#7

First, did you compile it in release mode? That can have a big impact on performance, cargo build --release.

Once you’ve done that, if its still very slow, here are some steps that could tune this code better.

Buffering output

The clearest way to improve the performance of this code, to me, is in how you are handling writing. By calling println! in every loop iteration, you perform a separate write call for each line that you print. Just as you’ve wrapped your reads in a BufReader, it makes more sense to wrap your writes in a BufWriter. The BufWriter will write once, when it is dropped (at the end of your main function, in this case).

Here I have done that, creating a BufWriter::new(io::stdout()) and then calling the writeln! macro instead of the println! macro.

    use std::io;
    use std::io::BufReader;
    use std::io::BufRead;
    use std::io::BufWriter;
    use std::io::Write;
    use std::fs::File;

    fn main() {   
        let f = File::open("data/SRR062634.filt.fastq").unwrap();
        let file = BufReader::new(&f);
        let mut writer = BufWriter::new(io::stdout());
        for (num, line) in file.lines().enumerate() {
            let l = line.unwrap();
            if  num % 4 == 0 {
            let chars: String = l.chars().skip(1).collect(); 
            writeln!(writer, ">{}", chars).unwrap();
            }
            if  num % 4 == 1  {
                writeln!(writer, "{}", l).unwrap(); 
             }
        }           
    }

Remove an unnecessary allocation

Another thing that is likely to be a performance problem, especially if your lines are very long, is this section here:

let chars: String = l.chars().skip(1).collect();
writeln!(writer, ">{}", chars).unwrap();

By creating a new String, you copy all of the chars from the original string (except the first one) into a new buffer. This is a new allocation for every line in the file, and depending on how long those files are, possibly a large one. Its also not necessary at all.

Because you know the first character is always an @, you know that the first character always takes up exactly 1 byte. So you can simply slice the string, starting from the second byte:

writeln!(writer, ">{}", &l[1..]).unwrap()

If your first character could be any arbitrary unicode character, there are other ways to get the index after the first char.

Writing bytes instead of using format strings

In both the writeln and println macros, you take a formatting string, and then perform string interpolation. You don’t need to go through the string interpolation, because what you’re printing is already stringified data. You can convert your string to bytes and use the write method, instead.

    use std::io;
    use std::io::BufReader;
    use std::io::BufRead;
    use std::io::BufWriter;
    use std::io::Write;
    use std::fs::File;

    fn main() {   
        let f = File::open("data/SRR062634.filt.fastq").unwrap();
        let file = BufReader::new(&f);
        let mut writer = BufWriter::new(io::stdout());
        for (num, line) in file.lines().enumerate() {
            let l = line.unwrap();
            if  num % 4 == 0 {
                writer.write(b">").unwrap();
                writer.write((&l[1..]).as_bytes()).unwrap();
                writer.write(b"\n").unwrap();
            }
            if  num % 4 == 1  {
                writer.write(l.as_bytes()).unwrap();
                writer.write(b"\n").unwrap();
            }
        }           
    }

If you do this, you notice you have to write the newline character yourself. (By the way, note that b"" is a byte string, which means it has to be ASCII values, not any Unicode characters). This is much less ergonomic than just using writeln!, and the performance advantage isn’t necessarily huge, so I recommend only making this change if, after the other changes, your performance is still a problem.


#8
➜   time ./target/debug/fastq2fasta >result.txt
./target/debug/fastq2fasta > result.txt  14.24s user 1.01s system 99% cpu 15.259 total
➜    time ./target/release/fastq2fasta >result.txt 
./target/release/fastq2fasta > result.txt  0.66s user 0.70s system 99% cpu 1.372 total

almost can not believe the difference. Thanks.


#9

Thanks so much for the detailed explanation. learn so much rust from it.

I did run the code. each time it improve the speed, but not as huge as just use the ‘–release’. probably because it is already very fast.

after buffer write, about 2 times fast

./target/release/fastq2fasta > result.txt  0.32s user 0.08s system 99% cpu 0.398 total

after remove unnecessary allocation.

./target/release/fastq2fasta > result.txt  0.24s user 0.06s system 99% cpu 0.297 total

after remove writeln!

./target/release/fastq2fasta > result.txt  0.22s user 0.06s system 99% cpu 0.282 total

#10

Locking stdout could probably also help a bit:
https://doc.rust-lang.org/std/io/struct.Stdout.html#method.lock