Reading and writing file : speed problems

Hi,

Recently I implemented a procedure that reads through a file makes some modifications and prints out modified lines. When I tested it on large files I realized it took way longer that what i am used to ... So I downsized the loop and gave it a go:

fn main() {
    let file = File::open("big_file.txt").unwrap();
    let reader = BufReader::new(file);
    for line in reader.lines() {
        println!("{}", line.unwrap());
    }
}

/*  
cargo build --release  
time ./readwrite > xx

real	1m2,755s
user	0m14,574s
sys	0m39,310s
*/

next i decided to use faster lib:


fn main() {
    let fbuffer = FileBuffer::open("big_file.txt").unwrap();
    let lines = str::from_utf8(&fbuffer).expect("not valid UTF-8");
    for line in lines.lines() {
        println!("{}", line);
    }
}

/*
cargo build --release 
./readwrite 

real	1m4,553s
user	0m13,777s
sys	0m38,522s
*/

And then I did the same in perl:

perl -lne 'print $_' big_file.txt > xx

real	0m18,884s
user	0m4,414s
sys	0m6,736s

in c "-O3" I was able to read through this file in < 4 sec

what am I doing wrong?

Just a wild guess, but println grabs a lock; it’s not what you want for speedy output.

1 Like

Writing to standard out is slow in Rust because it is line buffered by default.

Using a 128 MB file with 10000000 lines:

$ time ./target/release/dsafjaskdl > output 
./target/release/dsafjaskdl > output  4.51s user 14.13s system 99% cpu 18.648 total

Remove the newlines and run it on the resulting single-line 118 MB file:

$ time ./target/release/dsafjaskdl > output    
./target/release/dsafjaskdl > output  0.03s user 0.09s system 99% cpu 0.124 total
2 Likes

What is your real time ? I am still getting that rust compiled is approx 3x slower than perl when stdout print is eliminated ... but c equivalent is still 2x faster than perl ...

Note that using String, you're also forcing the compiler to check that the data is valid utf-8. You can use a Vec<u8> instead?

2 Likes

It looks like there's an issue for exactly this:

One way to avoid the line buffering problem would be to wrap STDOUT in another, much larger buffer.

fn main() {
    let file = File::open("big_file.txt").unwrap();
    let reader = BufReader::new(file);
    let mut writer = BufWriter::new(std::io::stdout()):

    for line in reader.lines() {
        writeln!(writer, "{}", line.unwrap());
    }
  writer.flush().unwrap();
}
2 Likes

My fellow Rust-ians, I Thank you !