Hi, I recently began to learn Rust (after using mostly Python for the last couple of years). As part of my training, I translate some system scripts to Rust. Among these is a script which traverses a directory tree, creates hash digests for each file, and compares the digest to a baseline. I noticed that the Rust version of this script takes 3x longer to execute, and tracked the difference down to the hashlib
/ sha256
performance. The Python (test) script looks like this:
#!/usr/local/bin/python3
import hashlib
def get_hash(file_path):
myhash = hashlib.sha256()
with open(file_path,'rb') as file:
content = file.read()
myhash.update(content)
return myhash.hexdigest()
if __name__ == '__main__':
filepath = "/path/to/file"
get_hash(filepath)
The Rust code looks almost identical:
use sha256::try_digest;
use std::path::Path;
fn get_hash(filepath: &str) -> String {
let input = Path::new(filepath);
let hash_digest = try_digest(input).unwrap();
hash_digest
}
fn main() {
let filepath = "/path/to/file";
get_hash(filepath);
}
The test file at /path/to/file
is 229 MB (to make the difference more pronounced), and a release build of the above Rust code takes 0.99 second to complete, while the Python code finished in 0.19 seconds. I am aware that Python relies on a C implementation for hashlib
, but I still wonder why Rust takes more than 5x longer (in this isolated example) and 3x longer (in the real world scenario with many small files described above).
I am grateful for any hints.
Jan