Something you could try to gather more data: What is the effect of duplicating work in your program. E.g. instead of
let start = SystemTime::now().duration_since(UNIX_EPOCH).unwrap();
//do my stuff
let end = SystemTime::now().duration_since(UNIX_EPOCH).unwrap();
do
let start = SystemTime::now().duration_since(UNIX_EPOCH).unwrap();
for _ in 0..10 {
//do my stuff
}
let end = SystemTime::now().duration_since(UNIX_EPOCH).unwrap();
With this modification you can see
whether the difference betweeen the two different times behaves as a constant summand or constant factor
with a factor 10 (or more), the times in question become more “human” time scales, so you could tell yourself whether the program actually runs, say, for 400ms or 7980ms. (On that note, a difference of 40ms vs 798ms should already be observable for human perception, too: does the program feel like it runs almost instantly or does it take almost a full second?)
One thing to pay attention to when using cargo run --release is that compilation might happen. Always make sure that the code is already compiled before you measure the execution time of cargo run --release. E.g. by running it twice and only measuring the second time; or another approach: just compile the program with cargo run --release or cargo build --release, and then execute the executable in the target directory directly to measure its run time.
Another factor that may be affecting your runtime measurements may be the CPU’s dynamic frequency scaling (although it may not explain a difference between 40 ms and 800 ms).
This may be an overkill, but you can also rely on criterion for benchmarking your code. I
Modern CPUs have variable speed, and speed of the CPU can vary by way more than differences in the code speed you're trying to measure. The things that affect execution are caches, clock speed boosts which vary a lot with CPU temperature, and energy-saving states of the CPU.
There's also a lot of variability in the OS. Multitasking, allocation of threads to faster or slower cores, hyperthreading, virtual memory page faults, disk caches. Timers also have limited resolution.
And then there's the optimizer which sometimes will see you're not using results of the code running in your test loops, conclude it's pointless, and delete the code your trying to measure, without telling you.
For all those reasons you need a proper benchmarking library that will run tests for long enough, and have a "black box" for the optimizer to ensure the test code isn't optimized out.