I am new to RUST and was curious about IO speed. I am surprised to see that reading a file with ~80M Lines takes about three times more in Rust than it does in Python or Perl. My Rust code is the following:
fn main() {
let mut file = File::open("testFile").unwrap();
let mut i = 0;
for line_r in io::BufReader::new(file).lines(){
let mut line:String = line_r.unwrap();
let char = line.chars().nth(0).unwrap();
if char == '>'{
i+=1;
if (i % 1000000) == 0 {
println!("Line: {}", i);
}
}
}
}
I am reading a text file and counting lines that start with ">". Note that I am intentionally avoiding any error checking to for testing purposes.
Without resorting to reading large chunks, i.e., sticking with reading lines, is the code above optimal? anything I am missing about IO in Rust?
You could use a BufReader without using lines, which saves you from allocating a string for each line, but that needs some more manual work using BufRead::fill_buf or using a parser library like nom that is already good at streaming byte parsing.
Not OP but I'd put good money on that it was a --release issue. I can't run the same test as him because I don't have the same testfile or the python code he ran again, but I did grab his rust code and built up a test file. --release made a huge difference.
$ time target/debug/bench
Line: 1000000
Line: 2000000
Line: 3000000
real 0m7.683s
user 0m7.644s
sys 0m0.036s
$ time target/release/bench
Line: 1000000
Line: 2000000
Line: 3000000
@Stusamll yes! Sorry my previous response was not clear enough. It was the fact that I compiling in debug and not release mode. Using --release yielded almost 15X speedup; which matches what you are finding.
By the way, the code can be written in a more idiomatic way:
// file doesn't have to be mut
let file = File::open("testFile").unwrap();
let mut i = 0;
for line in io::BufReader::new(file).lines() {
// line doesn't have to be mut either
// and you don't have to specify types explicitly most of the time
// and you can reuse the same variable name! (shadowing)
let line = line.unwrap();
// no need to unwrap here
// you can read this condition as:
// if the line is not empty and its first character is '>'
if line.chars().nth(0) == Some('>') {
i += 1;
if i % 1000000 == 0 {
println!("Line: {}", i);
}
}
}