Benchmarking small changes to Rust code

Often, one might want to change the implementation of a function which is called as part of a deep call stack. Say we have something like p(x) = f(g(h(x))), where h(x) might be called a number of times. We are not interested in the performance of h(x) in isolation because h(x) is never called alone. We are only interested in the performance of p(x).

Ideally, one would like to benchmark the performance of p(x) before and after a change in the implementation of h(x). Somthing like this:

p1(x) = f(g(h1(x))
p2(x) = f(g(h2(x))

for two possible implementations h1(x) and h2(x). But this might require rewriting a large number of functions, which is messy and error-prone.

Ideally, I'd like to be able to do something like:

branch! {
  "default" => {
    pub fn h(x) {
      // implementation #1
    }
  },
  "alternative" => {
    pub fn h(x) {
      // implementation #2
    }
  }
}

This (hypothetical) macro would allow me to to have a cargo script which would:

  1. Compile the code that's in the "default" branch (thus generating a function h(x) with one implementation. The code einside the branch doesn't even need to be a full function, it could be part of the body of a function.
  2. Run the tests to see if everything passess
  3. Run some benchmarks with this implementation (not that we wouldn't be benchmarking the function h(x) directly, we would be benchmarking the functions which we actually care about and which may depend on h(x).
  4. Save the result from these benchmarks somehow
  5. Compile the code that's in the "alternative" branch (thus generating a function h(x) with a different implementation
  6. Run the tests (to see if both versions of the code are compatible)
  7. Run the same benchmarks again and save the result
  8. Show the benchmark results side by side

O have implemented this successfully in Elixir, which makes it very easy to run arbitrary code at compile-time and recompile the project at runtime.

Would this be possible in Rust?

Maybe you can write something useful with Features - The Cargo Book and then use cfg to switch between the active features.
You can specify the feature you want to have active in the command line on each call to cargo and that should work with whatever benchmarking harness.
Saving and comparing the results is depending on the benchmarking harness you choose, but one of the most popular ones - criterion - supports it. Command-Line Options - Criterion.rs Documentation

1 Like

Use criterion. It automatically compares with a previous run, which is good enough when switching back and forth. You can also save old version's benchmark as a baseline while you experiment with a new one.

For observing small performance changes you will need criterion's warmup and outlier filtering.

Another option for checking small changes is looking at the assembly via cargo asm. It supports --mca option that runs llvm-mca which tries to estimate cost/throughput of the code.

3 Likes

+1 to this. Once you're looking at something that's maybe a dozen lines of ASM, actually measuring it in a benchmark is nearly impossible. It's especially hard to get something meaningfully if it uses memory at all. I'm not even sure it's meaningfully possible inside a normal OS.

Down at that level, look at the --timeline view from MCA. Don't even try to phrase it in terms of time, because that's meaningless at that level.

1 Like

One more thought: Parameterize your function under test (p(x)) so that h(x) can be chosen by the caller. Then run both implementations under criterion and graph how they compare to one another directly (e.g. with varying x), rather than how the changes compare over previous runs.

This is an interesting idea, but it only works if I'm calling h(x) directly from p(x). If the call to h(x) is nested, it will require parametrizing a large number of functions, which is inconvenient.

Yes, the same parameterization is implied across the entire call stack. It has additional boilerplate and monomorphization costs.

Convenience is low priority IMHO, especially when you are in the midst of profiling and optimizing. Once you've settled on a good implementation, strip the boilerplate.

Or even simpler, use git (branches, stash, etc.) to switch between implementations for comparing over previous runs.

If you're just looking for a better cfg! macro, the most popular one is cfg-if — Rust library // Lib.rs