Rust IO speed versus other languages

I am new to RUST and was curious about IO speed. I am surprised to see that reading a file with ~80M Lines takes about three times more in Rust than it does in Python or Perl. My Rust code is the following:

fn main() {
    let mut file = File::open("testFile").unwrap();
    let mut i = 0;
    for line_r in io::BufReader::new(file).lines(){
        let mut line:String = line_r.unwrap();
        let char = line.chars().nth(0).unwrap();
        if char == '>'{
            i+=1;
        
            if (i % 1000000) == 0 {
                println!("Line: {}", i);
            }
        }
    }
}

I am reading a text file and counting lines that start with ">". Note that I am intentionally avoiding any error checking to for testing purposes.

Without resorting to reading large chunks, i.e., sticking with reading lines, is the code above optimal? anything I am missing about IO in Rust?

Thanks

Mandi

A colleague just informed my of the cargo --release option. That changes things quite a bit! :smile:

10 Likes

I'd also add that I believe println! Does an implicit flush, which will also throw off these results.

You could use a BufReader without using lines, which saves you from allocating a string for each line, but that needs some more manual work using BufRead::fill_buf or using a parser library like nom that is already good at streaming byte parsing.

2 Likes

Thanks for the suggestion. I just tried with

let mut buf_reader = BufReader::new(file);
let mut buffer = [0; 1000000];
with let Ok(x) = buf_reader.read(&mut buffer){
 // same logic goes here
}

It did not make seems to make a substantial difference in speed.

The number of println! calls is minimal so removing it does not noticeably affect the runtime.

Keep in mind that Python and Perl are really optimized for that kind of operation.

Can you provide your python variant?

Maybe @mandi can update the timings when using the --release option.

1 Like

Not OP but I'd put good money on that it was a --release issue. I can't run the same test as him because I don't have the same testfile or the python code he ran again, but I did grab his rust code and built up a test file. --release made a huge difference.

$ time target/debug/bench
Line: 1000000
Line: 2000000
Line: 3000000

real 0m7.683s
user 0m7.644s
sys 0m0.036s

$ time target/release/bench
Line: 1000000
Line: 2000000
Line: 3000000

real 0m0.461s
user 0m0.432s
sys 0m0.028s

2 Likes

@Stusamll yes! Sorry my previous response was not clear enough. It was the fact that I compiling in debug and not release mode. Using --release yielded almost 15X speedup; which matches what you are finding.

1 Like

So Rust is 5 times faster than Python and Perl? :tada:

By the way, the code can be written in a more idiomatic way:

// file doesn't have to be mut
let file = File::open("testFile").unwrap();
let mut i = 0;
for line in io::BufReader::new(file).lines() {
    // line doesn't have to be mut either
    // and you don't have to specify types explicitly most of the time
    // and you can reuse the same variable name! (shadowing)
    let line = line.unwrap();
    // no need to unwrap here
    // you can read this condition as:
    // if the line is not empty and its first character is '>'
    if line.chars().nth(0) == Some('>') {
        i += 1;
        if i % 1000000 == 0 {
            println!("Line: {}", i);
        }
    }
}
3 Likes

A small increment: I'd just use line.starts_with('>') there.

5 Likes

Awesome! Could you update the first post with the new comparison number?

1 Like

@bugaevc, @bluss thank you for the example. Very helpful!

Would love to see this rounded off with what the actual runtime was in Rust vs Perl and Python when you actually used the --release option.