Rust IO speed versus other languages


#1

I am new to RUST and was curious about IO speed. I am surprised to see that reading a file with ~80M Lines takes about three times more in Rust than it does in Python or Perl. My Rust code is the following:

fn main() {
    let mut file = File::open("testFile").unwrap();
    let mut i = 0;
    for line_r in io::BufReader::new(file).lines(){
        let mut line:String = line_r.unwrap();
        let char = line.chars().nth(0).unwrap();
        if char == '>'{
            i+=1;
        
            if (i % 1000000) == 0 {
                println!("Line: {}", i);
            }
        }
    }
}

I am reading a text file and counting lines that start with “>”. Note that I am intentionally avoiding any error checking to for testing purposes.

Without resorting to reading large chunks, i.e., sticking with reading lines, is the code above optimal? anything I am missing about IO in Rust?

Thanks

Mandi


#2

A colleague just informed my of the cargo --release option. That changes things quite a bit! :smile:


#3

I’d also add that I believe println! Does an implicit flush, which will also throw off these results.


#4

You could use a BufReader without using lines, which saves you from allocating a string for each line, but that needs some more manual work using BufRead::fill_buf or using a parser library like nom that is already good at streaming byte parsing.


#5

Thanks for the suggestion. I just tried with

let mut buf_reader = BufReader::new(file);
let mut buffer = [0; 1000000];
with let Ok(x) = buf_reader.read(&mut buffer){
 // same logic goes here
}

It did not make seems to make a substantial difference in speed.

The number of println! calls is minimal so removing it does not noticeably affect the runtime.


#6

Keep in mind that Python and Perl are really optimized for that kind of operation.


#7

Can you provide your python variant?


#8

Maybe @mandi can update the timings when using the --release option.


#9

Not OP but I’d put good money on that it was a --release issue. I can’t run the same test as him because I don’t have the same testfile or the python code he ran again, but I did grab his rust code and built up a test file. --release made a huge difference.

$ time target/debug/bench
Line: 1000000
Line: 2000000
Line: 3000000

real 0m7.683s
user 0m7.644s
sys 0m0.036s

$ time target/release/bench
Line: 1000000
Line: 2000000
Line: 3000000

real 0m0.461s
user 0m0.432s
sys 0m0.028s


#10

@Stusamll yes! Sorry my previous response was not clear enough. It was the fact that I compiling in debug and not release mode. Using --release yielded almost 15X speedup; which matches what you are finding.


#11

So Rust is 5 times faster than Python and Perl? :tada:

By the way, the code can be written in a more idiomatic way:

// file doesn't have to be mut
let file = File::open("testFile").unwrap();
let mut i = 0;
for line in io::BufReader::new(file).lines() {
    // line doesn't have to be mut either
    // and you don't have to specify types explicitly most of the time
    // and you can reuse the same variable name! (shadowing)
    let line = line.unwrap();
    // no need to unwrap here
    // you can read this condition as:
    // if the line is not empty and its first character is '>'
    if line.chars().nth(0) == Some('>') {
        i += 1;
        if i % 1000000 == 0 {
            println!("Line: {}", i);
        }
    }
}

#12

A small increment: I’d just use line.starts_with('>') there.


#13

Awesome! Could you update the first post with the new comparison number?


#14

@bugaevc, @bluss thank you for the example. Very helpful!


#15

Would love to see this rounded off with what the actual runtime was in Rust vs Perl and Python when you actually used the --release option.