Optimal way to clone a StringRecord

I'm developing a crate for correlation analysis in bioinformatic. Few months ago I asked this question about iproduct! iterator with lazy matrix. As right part of iproduct! needs to be clonable and I didn't have much time I ended up working with a collected matrix.

Now, I'm working with big files (2.5GB or more) so collecting the matrix is not an option. An user in my previous questions suggested to make a custom struct which implements Clone trait. So I made the changes (playground).

NOTE I'm using PyO3 (version 0.12.4) as I want the crate to be used from Python too, that's why there are several PyResult or custom exceptions, but they're not important in this question.

Here's the playground code:

use csv::{Reader, ReaderBuilder};
use pyo3::create_exception;
use pyo3::PyResult;
use std::fs::File;
use itertools::iproduct;

// Custom exception to be used in Python
create_exception!(myproject, GGCAError, pyo3::exceptions::PyException);

// Types
type TupleExpressionValues = (String, Option<String>, Vec<f64>);
type LazyMatrixInner = Box<dyn Iterator<Item = TupleExpressionValues>>;

// Generates a Reader from csv file with tab delimiter
fn reader_from_path(path: &str) -> PyResult<Reader<File>> {
    let reader_builder = ReaderBuilder::new().delimiter(b'\t').from_path(path);

    match reader_builder {
        Err(er) => Err(GGCAError::new_err(format!(
            "The dataset '{}' has thrown an error: {}",
            path, er
        ))),
        Ok(reader) => Ok(reader),
    }
}

pub struct LazyMatrix {
    path: String,
    inner: LazyMatrixInner,
}

impl LazyMatrix {
    pub fn new(path: &str) -> PyResult<Self> {
        let lazy_matrix = Self::get_df(path)?;

        Ok(LazyMatrix {
            path: path.to_string(),
            inner: lazy_matrix,
        })
    }
    
    fn get_df(path: &str) -> PyResult<LazyMatrixInner> {
        // Build the CSV reader and iterate over each record.
        let reader = reader_from_path(path)?;
        let dataframe_parsed = reader
            .into_records()
            .enumerate()
            .map(|(row_idx, record_result)| {
                let record = record_result.unwrap();
                let mut it = record.into_iter();
                let gene_or_gem = it.next().unwrap().to_string();
                let lazy_matrix = it
                    .enumerate()
                    .map(|(column_idx, cell)| {
                        cell.parse::<f64>().expect("Error...")
                    })
                    .collect::<Vec<f64>>();

                (gene_or_gem, None, lazy_matrix)
            });

        Ok(Box::new(dataframe_parsed))
    }
}


impl Iterator for LazyMatrix {
    type Item = TupleExpressionValues;
    fn next(&mut self) -> Option<Self::Item> {
        self.inner.next()
    }
}

impl Clone for LazyMatrix {
    fn clone(&self) -> Self {
        Self::new(self.path.as_str()).unwrap()
    }
}


fn main() -> PyResult<()> {
    let lazy_matrix_1 = LazyMatrix::new("first.csv")?;
    let lazy_matrix_2 = LazyMatrix::new("second.csv")?;
    
    // Slow
    iproduct!(lazy_matrix_1, lazy_matrix_2);
    
    // Fast
    let lazy_matrix_2_collected = lazy_matrix_2.collect::<Vec<TupleExpressionValues>>();
    iproduct!(lazy_matrix_1, lazy_matrix_2_collected);
    
    Ok(())
}

The problem is that this solution is much slower in comparison to collected matrix. Is there a way to optimise this code? I was thinking about reuse the created Reader, but I don't think that it's the slower part.

Any kind of help would be really appreciated

Of course it is. You can either read the whole data into memory and have it available there, which makes operations on it much faster, or not read it into memory all at once, which will incur additional performance costs since you need to periodically read from the underlying file. There's nothing you can do about this fundamental trade-off.

One thing you could try is increasing the buffer size of the reader. I checked out the CSV crate and it looks like its default buffer size is 8 kB. You could try increasing it to something reasonably big, say, 1 MB or so – this would decrease the number of times the disk needs to actually be read, without unconditionally reading the entire thing into memory.

Thank you so much for your answer! Unfortunatelly increasing the buffer size didn't help. I've adopted a strategy where if the files is small. It's collected, otherwise it uses LazyMatrix