Slow ndarray_csv reading large text file


I have a large text file that is output from another SW package. It takes ndarray_csv about 32 seconds to read but numpy can do it in 8 seconds, via np.load_text(). I am assuming I am doing something wrong to have such a slow read. I am simply using the code from the ndarray_csv docs:

fn read_array_data(p: &PathBuf, has_header: bool) -> Result<Array2<f64>, IOError> {
    let file = File::open(p)?;
    let mut reader = ReaderBuilder::new()
        // .comment(Some(b'!'))
    let array_read: Array2<f64> = reader.deserialize_array2((1200, 18004))?;

Any thoughts on improving the speed would be helpful!

Adjust the buffer size of the reader type. Then you'll see the performance you want.

Also, have you remembered to use - - release?

1 Like

Ah, the buffer size, this controls how much data it will slurp up on each read? Makes sense.

I tried a direct read approach for my particular file and got a 30% increase:

fn read_direct(p: &PathBuf) -> Result<Vec<Vec<f64>>, IOError> {
    let file = File::open(p)?;
    let reader = BufReader::new(file);
    let mut res = vec![];
    for line in reader.lines() {
        let l = line?;
                .map(|x| x.parse::<f64>().expect("NAN!"))

BUT, BOOM! You nailed it with the --release flag, now the original read is 1.5 secs and the "direct" read is 1.3 secs. That's a huge difference obviously. My bad for forgetting about that! Thanks for the help!!

Sorry for the loose reply, but I am unable to write this more in depth.
I urge you to use the buffer setter for the reader builder, and specify the number yourself. Then see how fast that makes the whole thing. Personally, I use a very large buffer.

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.