I have a test file for my crate that's a bit too large to distribute as a plain file in the repo (~30MB). I'm looking for the best way to download it only when building (or running) tests.
Apart from plugging into the build (not sure if I can detect building tests from inside build.rs), I'm looking for a way to actually download the file. Shelling out to curl/wget is an option, though ugly. Just about any http client is async nowadays and pulling in and building the whole stack might well take more time (and data) than the download itself.
Are there any established approaches to this? I don't need anything fancy, just to download a single file that won't probably ever change (and not redownload it with every build/test run, I guess).
First of all, sorry if I'm sounding too negative about all the suggestions to bundle the file myself. The file is really a sample file from the parent project I'm integrating with (writing a Rust SDK for) and they don't bundle it either (they download it on demand but they have it easier with cmake ;)) so I'd rather follow their steps. Still, it may turn out to be too much of a hassle, so I greatly appreciate all the suggestions.
I thought about a dev dependency but I'm not entirely sure how I'd go about it. Sure, put a dummy lib.rs and the test file in a single package, but then what? How do I find the file during my build? I can include_bytes! it, expose a function that writes it back to disk (I need an actual on-disk file) an call that from my tests, but that feels so overkill.
The tests in question only use the public API, so worst case I could make a separate crate with the test file and the cases that use it.
The sad thing is that cargo already has a built in http client that would work perfectly fine but there's no way to use it.
Downloading of files during build is problematic, not only because it pulls in a ton of dependencies, but also because some users want to build packages offline, vendor them, etc. Cargo will also verify checksums, but your downloader probably wouldn't, unless you added even more deps and code.
include_bytes! would be okay if you need it as &[u8] in Rust.
If you need it only as a file on disk, it's still doable, but a bit more clunky. Crates can communicate with each other via env vars if they happen to have links manifest key (despite the name, it doesn't link to anything).
Add links = "globallyuniquestringhere" to the package with the data file
Add build.rs that does println!("cargo::data_file_path={}", std::fs::canonicalize("path/to/data/file").unwrap().display()).
In your crate with tests, use env!("DEP_GLOBALLYUNIQUESTRINGHERE_DATA_FILE_PATH") to get the path.
To add to this, downloading test/build data during a test/build also makes that test/build brittle in another way: when the servers go down for any reason, so does your ability to test/build.
Especially if the test data file doesn't change too often, you can just go ahead and check it into git. I've taken that approach for larger files (that also barely ever change) and it works fine, you just have a slightly larger initial download.