Testing IO and formatting performance


#1

I’m trying to replicate this interesting question asked on Quora

It splits a csv file from rows to columns; and the implementation is given here
C++ source

I tried to replicate this in Rust as an exercise (and a way to get this into our organisation)

 #![feature(test)]
use std::{fs};
use std::io::Read;
use std::io::Write;
use std::env;
use std::str;

fn convert_file(filename : &str) {
    let mut input = fs::File::open(&filename).expect("Could not open file");
    let mut buf = Vec::with_capacity(100000);
    match input.read_to_end(&mut buf) {
        Ok(n)  => {
            let mut output = fs::File::create(&filename.replace(".txt",".csv")).expect("Could not open file for writing");
            let mut outbuf = Vec::with_capacity((n as f32 *1.05) as usize);
            for mut i in buf.split(|&x| x == b',').collect::<Vec<_>>() {
                outbuf.append(&mut i.to_vec());
                outbuf.push(b'\n');
            }
            output.write_all(&outbuf).expect("Could not write file");
        }
        Err(e) => println!("{:?}",e),
    }
}
fn main() {
    let args : Vec<String> = env::args().collect();
    if args.len() < 2 {
        println!("No filename given");
        return;
    }
    convert_file(&args[1]);

}

#[cfg(test)]
mod tests{
    use super::*;
    extern crate test;
    use self::test::Bencher;
    #[bench]
    fn read_file(b: &mut Bencher) {
        b.iter(|| {
            let _ = convert_file("new.txt");
            0
        });
    }
}

Performance is lacklustre, so was wondering if there were any ideas on improving this.

λ .\StrReplaceCC.exe new.txt
Starting with std::string
CPP Runtime in mSecs: 216
Starting with C stdlib functions
C Runtime in mSecs: 90
Total Runtimes
CPP with std::string: 0.216000
C with stdlib: 0.090000
In this example, C was 2.000000 times faster than C++
Press return to continue

λ cargo bench
warning: custom registry support via the `registry.index` configuration is being removed, this functionality will not work in the future
   Compiling strreplace v0.1.0 (file:///C:/Users/ukb99427/visual%20studio%202015/Projects/StrReplaceCC/strreplace)
    Finished release [optimized] target(s) in 1.30 secs
     Running target\release\deps\strreplace-f3267b72a0069d20.exe

running 1 test
test tests::read_file ... bench: 539,857,042 ns/iter (+/- 43,616,165)

To generate a sample file you can use this python script

if __name__ == "__main__":
    # generate
    fs = open("new.txt","w")
    for i in range(0,1000000):
        fs.write("fewfw.car@gmail.com,")
        fs.write("bar.cfwfwar@gmail.com,")
        fs.write("efwefw.cfwfwar@gmail.com,")
    fs.close()

#2

Didn’t look too closely, but you’re doing a ton of allocations in:

for mut i in buf.split(|&x| x == b',').collect::<Vec<_>>() { // allocation here to collect everything back into a Vec
        outbuf.append(&mut i.to_vec()); // another Vec allocation here
        outbuf.push(b'\n');
}

This code should look more like:

let mut outbuf = Vec::with_capacity((n as f32 * 1.05) as usize);
     for i in buf.split(|&x| x == b',') {
          outbuf.extend(i);
          outbuf.push(b'\n');
     }

#3

That helps, we are down to
test tests::read_file ... bench: 328,474,961 ns/iter (+/- 29,751,066)
about 1.5x slower than the C++

Interestingly the naive approach of pushing every byte is slower, though the c++ version that works the fastest.


#4

This is (most likely) because the Vec is doing capacity checks for each push; I didn’t look at the C++ code, but I’m assuming it does no such thing? You generally want to batch operations (in any language, really).


#5

I think a lot of the performance issues are because you directly ported C++ code to Rust, so you use your usual C++ patterns instead of Rust ones. You can actually do this entire operation without needing to allocate intermediate buffers by using iterators. As a bonus, this also makes the code significantly cleaner.

fn convert_file(filename: &Path) -> Result<(), Error> {
    let output_file = filename.with_extension("csv");

    let mut input = BufReader::new(File::open(filename)?);
    let mut output = BufWriter::new(File::create(output_file)?);

    // create a lazy iterator over the input by splitting on every ',' byte
    let columns = input.split(b',');
    
    for column in columns {
        // because this is a lazy iterator and we're doing IO (which may fail)
        // it returns a Result every time. We use "?" to get the inner value, 
        // bailing early if there was an error.
        let row = column?;
        output.write(&row)?;
        output.write(b"\n")?;
    }

    Ok(())
}

All that program does is open the file wrapping it in a BufReader, then uses the BufReader::split() method to split the input stream on every comma. Writing the field to our output file as a new row. You should also get some speed ups by using buffered readers/writers instead of reading/writing directly from the file object (eachread()` call is usually an expensive syscall).


#6

Hi @Michael-F-Bryan

Thanks for the more Rust like version. Though it is more Rust like and cleaner, it’s also sadly a lot slower

running 1 test
test tests::read_file … bench: 469,412,642 ns/iter (+/- 60,976,466)

@vitalyd fixes worked better

But you did give me an idea on file io; giving me a solution like this, which uses the BufWriter for output, but still allocates memory for the input.

fn convert_file(filename : &Path) -> Result<(),Error> {
    let output_file = filename.with_extension("csv");
    let mut input = File::open(&filename)?;
    let mut buf = Vec::with_capacity(100000);
    match input.read_to_end(&mut buf) {
        Ok(n)  => {
            let mut output = BufWriter::new(File::create(output_file)?);
            let splitter = buf.split(|&x| x == b',');
            for i in splitter {
                output.write(i)?;
                output.write(b"\n")?;
            }
        },
        Err(_e) => {}
    }
    Ok(())
}

This has a performance of

test tests::convert_file ... bench: 271,136,114 ns/iter (+/- 44,077,904)

within 25% of the C++ but still 2x slower than the C version.

This is really useful exercise for a larger PoC at work, where we have ~100mb structured files (like json but not exactly), containing market data for trades. The current C++ version is ok but slow and buggy. I want to get Rust in, on the basis it’s far more robust, but we can’t sacrifice speed given the pricing per trade takes 30s of which the C++ market data processing is 15s.


#7

Note that std::io::Split allocates a new Vec on each next(): https://doc.rust-lang.org/src/std/io/mod.rs.html#2094-2110

This is true but only if you’re read()ing a byte at a time (or a small amount). The program here reads into a large buffer (Vec) and writes from a large buffer.


#8

Why not a slice? Since a split() would never change an input array?


#9

The above is for an IO split, not a slice. You can’t split IO until you’ve actually read the data :slight_smile:


#10

@rusty_ron have you tried measuring the IO and splitting/replacement separately between C, C++ and Rust?


#11

I recently wrote a log parser in Rust.

  • I used the lines() iterator to do the splitting. Even though it allocates a new Vec every time it didn’t slow things down. Try it and measure.
  • The biggest speed up came from making it multi-threaded using channels. I read the lines in one channel, sent them to a second channel for parsing, and then to a third channel for writing. (Actually it was more complicated than that since my parsing was a lot more involved than yours, but you get the picture).

The final speed was impressive, it can process a log file faster than Windows can copy it. (My parser only writes about 20% of the original data, which is why it was faster). The code is not public but the basics are:

let (tx_lines, rx_lines) = channel();
let (tx_lines2, rx_lines2) = channel();
let (tx_lines3, rx_lines3) = channel();
let (tx_lines4, rx_lines4) = channel();
let (tx_lines5, rx_lines5) = channel();

// This basically handles reading the line into a new Vec.
let handle1 = thread::spawn(move || {
    for buffer in fast_logfile_iterator::FastLogFileIterator::new(reader) {
        if tx_lines.send(buffer).is_err() {
            break;
        }
    }
});

// Then a second thread takes that Vector and parses out the main components
// and passes them on.
let handle2 = thread::spawn(move || {
    for line in rx_lines {
        let parsed_data = parse_main_block(&line);
        let result = (line, parsed_data);
        if tx_lines2.send(result).is_err() {
            break;
        }
    }
});

// elided...

// The final thread just writes out the vec. It might be tempting to get rid
// of this thread and just move the call to write() into the previous thread
// but the runtime more than doubles if you do that.
let handle6 = thread::spawn(move || {
    for line in rx_lines5 {
        writer.write(&line).unwrap();
    }
});

handle1.join().unwrap();
handle2.join().unwrap();
handle3.join().unwrap();
handle4.join().unwrap();
handle5.join().unwrap();
handle6.join().unwrap();

#12

The other observation I would make is that if you can read the entire file into memory in one hit, you could use rayon to run the parsing in parallel. (I had to stream mine, which is why I used channels).


#13

Not allowed - threading is strictly forbidden in the code


#14

As an aside, I don’t think the input and output Vecs are sized like the C++ code. C++ allocates the input buf = file length. It also counts number of commas upfront and then allocates an output based on that.

As for threading, I don’t think it makes sense to parallelize over a file - the work is too trivial and IO is sequential. You can parallelize over multiple files though, but that defeats the purpose of the exercise I believe.


#15

@rusty_ron, forgot to ask - have you tried a simple in-place replacement loop? Something like:

let mut file = fs::File::create(output_file)?;
for b in buf.iter_mut() {
    if *b == b',' {
        *b = b'\n';
    }
}
file.write_all(&buf)?;

#16

That would not quite work. The specifics of the problem require an extra character [’,’,’\n’], thus the need for memory allocation and/or larger buffer for the writes than read.


#17

Sorry, what extra character is required?


#18

There is supposed to be a comma before the newline, IIUC. But the current Rust code does not add it…


#19

Hmm, ok - none of the code using split() that’s been pasted includes the original comma in the output (split() does not include the separator in the iteration). Hence I suggested the simple replacement.


#20

That’s an oversight on my part so apologies, and would probably add a slight overhead