Slow ndarray_csv reading large text file

Hello,

I have a large text file that is output from another SW package. It takes ndarray_csv about 32 seconds to read but numpy can do it in 8 seconds, via np.load_text(). I am assuming I am doing something wrong to have such a slow read. I am simply using the code from the ndarray_csv docs:

fn read_array_data(p: &PathBuf, has_header: bool) -> Result<Array2<f64>, IOError> {
    let file = File::open(p)?;
    let mut reader = ReaderBuilder::new()
        .has_headers(has_header)
        // .comment(Some(b'!'))
        .from_reader(file);
    let array_read: Array2<f64> = reader.deserialize_array2((1200, 18004))?;
    Ok(array_read)
}

Any thoughts on improving the speed would be helpful!

Adjust the buffer size of the reader type. Then you'll see the performance you want.

Also, have you remembered to use - - release?

1 Like

Ah, the buffer size, this controls how much data it will slurp up on each read? Makes sense.

I tried a direct read approach for my particular file and got a 30% increase:


fn read_direct(p: &PathBuf) -> Result<Vec<Vec<f64>>, IOError> {
    let file = File::open(p)?;
    let reader = BufReader::new(file);
    let mut res = vec![];
    for line in reader.lines() {
        let l = line?;
        res.push(
            l.split(',')
                .map(|x| x.parse::<f64>().expect("NAN!"))
                .collect(),
        );
    }
    Ok(res)
}

BUT, BOOM! You nailed it with the --release flag, now the original read is 1.5 secs and the "direct" read is 1.3 secs. That's a huge difference obviously. My bad for forgetting about that! Thanks for the help!!

Sorry for the loose reply, but I am unable to write this more in depth.
I urge you to use the buffer setter for the reader builder, and specify the number yourself. Then see how fast that makes the whole thing. Personally, I use a very large buffer.

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.