Stream File Into Memory

I need to create a file, write some header data to it, and then copy the entire contents of another file into the new file I created. The code works fine on my machine, but I'm wondering if it will work predictably everywhere.

Here's my code:

use std::fs::File;
use std::io::{Read, Write};

const BUFFER_SIZE: usize = 1024 * 32;

const HEADER: [u8; 64] = [
    0x65, 0x61, 0x33, 0x03, 0x00, 0x00, 0x00, 0x00,
    0x07, 0x76, 0x47, 0x45, 0x4F, 0x42, 0x00, 0x00,
    0x01, 0xC6, 0x00, 0x00, 0x02, 0x62, 0x69, 0x6E,
    0x61, 0x72, 0x79, 0x00, 0x00, 0x00, 0x00, 0x4F,
    0x00, 0x4D, 0x00, 0x47, 0x00, 0x5F, 0x00, 0x4C,
    0x00, 0x53, 0x00, 0x49, 0x00, 0x00, 0x00, 0x01,
    0x00, 0x40, 0x00, 0xDC, 0x00, 0x70, 0x00, 0x08,
    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x4B, 0x45,
];

fn main() {
    let mut buf = [0; BUFFER_SIZE];

    let mut src_file = File::open("./assets/src.bin")
        .expect("Failed to open src.bin");

    let mut dst_file = File::create("./assets/dst.bin")
        .expect("Failed to create dst.bin");

    dst_file.write_all(&HEADER).expect("Failed to write header");

    loop {
        let bytes_read = src_file.read(&mut buf).expect("Failed to read src.bin");

        if bytes_read == 0 {
            break;
        }

        let valid_buf = &buf[..bytes_read];

        dst_file.write_all(valid_buf).expect("Failed to write dst.bin");
    }
}

I'm pretty sure the way I read files will be fine, but what about write_all()? Is it possible that lower-powered devices could fail to write_all()? Would it be safer to use write(), and then conditionally use subsequent write() calls to make sure all of the data's written?

Would it also be better to use an array or a vector as my buffer? The buffer's size is deterministic at compile-time, so I like the idea of stack allocation over heap in this case. Program memory isn't a concern either since the buffer's meant to be small.

I think you might be able to replace your loop entirely with a call to std::io::copy.

4 Likes

write_all already internally uses multiple write calls in a loop, so there shouldn't be much of a difference. The only thing I can see is that if you use write_all and one of the writes errors, you won't be able to tell exactly how many bytes were written before the failure, whereas if you write it yourself you'll know the last write call that succeeded, and therefore how many bytes have been written. I'm not super familiar with what kind of errors to write would be recoverable, so I can't say if you'll meaningfully be able to do anything with this information though.

As for using a vector or array, you should also be fine there, the calls to write convert the array and vector into slices so it shouldn't matter which one you start with.

2 Likes

You basically re-invented BufReader.

Even though BufReader uses heap allocation, I'd prefer using standard tools, if there is no significant, measurable bottle neck.

2 Likes

I was wondering about this too, but I read somewhere else that BufReader was meant for much smaller chunks (like 8KB or something like that). If it's more appropriate to use BufReader, then I'd rather use that to avoid reinventing the wheel.

This is perfect for my use-case. I had a feeling something like this existed, but I couldn't find it (the standard library is MASSIVE). Thanks!

It's configurable: BufReader::with_capacity