How to efficiently read large data for functions which takes bytes as parameters?

Reads data from a source which implements the Read trait (e.g., files or standard input), and processes it in some way. For example, generating a QR code from input data, or generating a PNG image from an input SVG image. If functions like this takes bytes (e.g., &[u8], &str, impl AsRef<[u8]>, etc.), how can I read the large input data efficiently?

I can think of ways to read it as Vec<u8> or String like this:

use std::{fs, str};

use anyhow::anyhow;

fn main() -> anyhow::Result<()> {
    let input = fs::read("foo.txt")?;
    if let Err(err) = str::from_utf8(&input) {
        Err(anyhow!(err))
    } else {
        println!("OK");
        Ok(())
    }
}

I think this way will requires memory equal to the file size, so if there is a way to do it that uses less memory, I'd like to use that.

If the function takes &[u8] (or anything like that) which must be the entire file contents, then it is demanding you load the entire file into memory first. You should consider functions which do that unfit for the purpose of processing large files.

A suitable interface would be for the function to accept Read or BufRead, or to have some state type so that you can pass bytes in as you read them rather than passing all bytes at once.

It is possible to work with a demand for &[u8] by memory-mapping the file, so that it can be loaded implicitly by the operating system as the memory is accessed, but this technique causes your program to exhibit undefined behavior if the file is modified while being read, and should therefore be a last resort.

7 Likes

Note that SVG-to-PNG is a rendering pass. It's entirely fine to load the whole SVG into memory for such a thing, because SVGs are vector data, not raster data.

Some image handling, like processing massive tiffs from NASA, can be worth trying to do incrementally. But I don't know of any case with reading SVGs where it's worth bothering doing anything other than just loading the whole thing into memory. You could use an XML reader that works on Read to not read it as a byte slice, but that's a relatively small optimization because you'll probably load the whole thing as its object model into memory anyway.

PNGs are written line-by-line though, IIRC, so you could be usefully incremental there, perhaps, rather than needing the whole things in memory for the write.

But overall, these days it's easy to get more than enough RAM that it's often fine to just fs::read things and not worry about it. Now that even phones often have 8 GB of RAM, it's a completely different world from back in the 32-bit days where even a nice desktop couldn't load over 2 GB without a bunch of complicated ceremony.

3 Likes

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.