New crate: “fetch-data”

Fetch data files from a URL, but only if needed. Verify contents via SHA256.

Fetch-Data checks a local data directory and then downloads needed files. It always verifies the local files and downloaded files via a hash.

Fetch-Data makes it easy to download large and small sample files. For example, here we download a genomics file from GitHub (if it has not already been downloaded). We then print the size of the now local file.

use fetch_data::sample_file;

let path = sample_file("small.fam")?;
println!("{}", std::fs::metadata(path)?.len()); // Prints 85


  • Thread-safe -- allowing it to be used with Rust's multithreaded testing framework.
  • Inspired by Python's popular Pooch and our PySnpTools filecache module.
  • Avoids run-times such as Tokio (by using ureq to download files via blocking I/O.)

Feature requests and contributions are welcome.

What surprised me while developing this:

  • Blocking I/O seems better than non-blocking (for this application)
  • Checking the hash of every file requested, even if already local, might be slower, but guarantees contents are as expected.
  • Question: Could Rust's ownership model make sure that functions using a downloaded file only read it and don't change it (which could mess up other functions that are using it on a different thread)?
  • Rust allows global static constructors (via the ctor crate). The constructor can't fail (It holds any errors for later). It also contains a mutex.

-- Carl

p.s. I haven't registered yet, but I'm planning on attending RustConf 2022 in Portland. I'd love to connect with folks to discuss scientific programming and/or library API design.


This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.