How to read large Arrow IPC files in batches for transformation with low memory usage?

Hi everyone,

I'm working with a Rust-based data processing pipeline using the polars and arrow2 crates. I have a flow where I batch-read CSVs and write them to an Arrow IPC file using IpcWriter with compression enabled:

let file = File::create(&arrow_file_path).unwrap();
let mut writer = IpcWriter::new(file)
    .with_compression(Some(IpcCompression::ZSTD))
    .with_parallel(true)
    .batched(polars_schema);

while let Some(batches) = csv_batched_reader.next_batches(10).unwrap() {
    for df in batches {
        let transformed_batch = apply_transformation(df, tikv_schema.transformations.to_owned())?;
        let mut updated_batch = rename_columns(transformed_batch, tikv_schema.rename_fields.to_owned())?;

        writer.batch(&mut updated_batch).unwrap();
    }
}
writer.finish().unwrap();

This part works great and creates a compressed Arrow IPC file (~400MB in size). However, the issue arises when I try to read the file back for further processing.

If I use LazyFrame::scan_ipc, or even try IpcStreamReader, the entire file is loaded into memory, causing the process to crash due to high RAM usage (despite having 24 GB of RAM). I believe this is because the file was written in batched mode using IpcWriter::batched, but I can't find a way to read it back in batches.

My questions are:

  1. Is there a recommended way to read an Arrow IPC file (written using IpcWriter::batched) in batches without loading the entire file into memory?
  2. Can IpcStreamReader or any other reader in arrow2 or polars read such files incrementally?
  3. Would it be better to use Arrow V1 file format (FileWriter/FileReader) instead of the stream format for this use case?
  4. Any recommended approach for working with large Arrow files efficiently in Rust when both writing and reading in chunks?

Any suggestions or best practices would be greatly appreciated!