Effect on benchmarking due to platform

Hi everyone, I was recently trying to use cargo bench in my project along with criterion. Iam just curious to know whether in measurable or direct way does either gets affect by the operating system in which these benches are performed. Is there any impact on the results too?

That's a fairly broad question and thus the answer is "it depends".
Performance results might be the same if you were to measure under different OSes on the same hardware... or they might turn out different. Which it'll be will depend on whether you're measuring something that's affected by the differences between OSes or not.

It can be fairly generic things such as one OS being configured to manage CPU clocks differently than the other (cargo bench measures walltime rather than CPU instructions executed, so clock rate matters). Or perhaps (if you're measuring multi-threaded code) locking has different performance characteristics or maybe the allocator has an impact...

On the other hand if you doing a single-threaded, compute-bound microbenchmark that doesn't allocate much and run it on identical hardware and measure cpu instructions then the OS usually doesn't matter. Though even there could be exceptions.

There's a bunch of things that vary depending on the OS.

  • In rust, you can compile different code depending on the OS. There's a lot of this in the standard library that translates syscalls to the public interface, and also things like OsString.
  • For any language, the syscalls it can use are decided by the OS. The performance of similar (even identical) syscalls in different OSes will be different, and if an OS doesn't have a particular syscall, you may need to make other less efficient syscalls for the same task.

Even if you don't use any syscalls, there's still many possible variations.

  • The OS can interrupt your benchmarking, and each OS will do this with different frequency and quantity. Usually, this is considered a useless result, and good benchmarking will ignore these occurrences.
  • The OS might change hardware operation parameters before running your code, like when mitigating CPU timing vulnerabilities or managing power.
  • The time-related functions may behave differently, with different accuracy and precision, affecting how your benchmarks are measured.

As a sidenote, comparing benchmarks not made on the same hardware and platform is usually pretty useless. You usually want to benchmark against a past version on the same machine and OS.

1 Like

Assuming the hardware stays exactly the same, for benchmarks with just pure computation running on a single thread, the OS has little influence. However, once you start allocating memory and running multiple threads, the OS may influence the results.