How to handle large files as sources of test data?

penguin359 · July 9, 2021, 4:04am

I am developing a file system driver in Rust and using real disk images as test data to verify proper functionality. While I can test many of the basics from a small disk image, some features of the file system are only exercised from a larger disk image. Once of the images I have generated as a test data source is a 2 GB image. The smaller images only amount to about 10 MBs total so I keep them in the Git repo and have them as part of the automatic unit tests run by cargo. I am not sure what to do about the larger image needed to test some of the other features that only get invoked on such a larger disk.

The 2 GB disk image is mostly empty and compresses down to 2 MB easily. I could add it to Git as a gzipped image file or even use Git LFS so I don't cause the repo to grow arbitrarily. However, if I do that, I think I'd like to have the tests that rely on the larger image only run conditionally if the files have been retrieves from LFS.

The files will need to be expanded before use the first time so I need a way to integrate that with the test runner. I have looked at using flate2 or something, but I need to be able to Read + Seek the file and Gzip compression doesn't really support Seek. If the underlying file system supported holes then the expanded disk image should not require much more space.

What's the best way to handle test cases that need larger data sources for full operation?

Michael-F-Bryan · July 9, 2021, 5:01am

If you can generate the images from nothing then I wouldn't commit them to your repository. Instead, generate them when they are needed and do it in such a way that you reuse the results from previous runs.

Maybe something like this?

use once_cell::sync::Lazy;
use std::{fs::File, io::{Error, Write}, path::{Path, PathBuf}};

pub fn disk_image_fixture() -> File {
    static MY_BIG_IMAGE: Lazy<PathBuf> = Lazy::new(|| {
        let image_path = Path::new(env!("CARGO_MANIFEST_DIR"))
            .join("tests")
            .join("fixtures")
            .join("foo.img");

        if !image_path.exists() {
            let f = File::create(&image_path).unwrap();
            generate_disk_image(f).unwrap();
        }

        image_path
    });

    File::open(&*MY_BIG_IMAGE).unwrap()
}

fn generate_disk_image(_writer: impl Write) -> Result<(), Error> { todo!() }

penguin359 · July 9, 2021, 5:47am

It's not always easy to pre-generate the images at compile/test time. Specifically, I am developing a driver for the Apple File System as documented on their developer website. All images were generated on macOS using Apple's official tools for building the file system. However, where this driver is most useful is on other OSes including Linux, FreeBSD, and Windows. That is where most of the testing/development happens and where it's most useful. However, there's aren't open source tools to generate this file system outside of macOS.

Michael-F-Bryan · July 9, 2021, 5:58am

Ah, in that case I'd generate the images once and store them with something like Git LFS.

Integrating with the test runner shouldn't be too difficult, just wrap the resource in a getter function and use some sort of synchronisation mechanism (e.g. once_cell's Lazy or std::sync::Once) to make sure you only do the setup logic (e.g. downloading from Git LFS) once.

There isn't a nice way to skip a test based on some condition at runtime (e.g. Go's T.Skipf(...)) so you'd probably need to just return early when the resource isn't present. It's not ideal because your test would be marked as passed even though it didn't really test anything, but I think that's as close as you can get with Rust's test runner.

fn disk_image_fixture() -> Option<File> {  todo!() }

#[test]
fn check_mac_image() {
  let image = match disk_image_fixture() {
    Some(f) => f,
    None => return,
  };

  ...
}

system · October 7, 2021, 5:58am

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.

Topic		Replies	Views
Downloading test data help	7	142	December 10, 2024
Best practices for managing test data help	9	2823	January 12, 2023
How to reliably locate test data files from tests? help	6	10989	January 12, 2023
Testing: Read testdata from subfolder	3	742	April 9, 2024
How can I test my filesystem code? help	4	707	April 6, 2023

How to handle large files as sources of test data?

Related topics