I have been working through benchmarking with criterion. It's very good at showing me performance with a single code tree, but I am now experimenting with comparing different versions.
I am confused by the documentation which says:
--save-baseline <name> will compare against the named baseline, then overwrite it.
--baseline <name> will compare against the named baseline without overwriting it.
--load-baseline <name> will load the named baseline as the new data set rather than the previous baseline.
I can't understand what --baseline does compared to --load-baseline. They seem to me both to compare the new run against .
By default, criterion compares the current run to the previous run.
--save-baseline <name> compares to the old version of <name>, then saves the current version as <name>, to let you keep track of a baseline "acceptable" performance.
--baseline <name> changes the comparison so that you use <name> instead of the previous run.
--load-baseline <name> changes the comparison so that you use the run called <name> instead of the current run.
The idea is that when you're at a good baseline, you use --save-baseline to remember it. You then use --baseline to compare not against the previous run (for incremental improvements), but against the baseline. And you use --load-baseline to compare not the current state, but a named baseline.
A workflow with this looks like:
--save-baseline before-optimization to save the current state
--baseline before-optimization to compare to the baseline, not the previous state.
--save-baseline after-optimization-type-one to save a state that reflects one path to faster running.
--save-baseline after-optimization-type-two to save a second optimized state with a conflicting path to faster running
--load-baseline after-optimization-type-one --baseline after-optimization-type-two to skip running the code, and just compare the two optimized forms to decide which one to use as your new baseline.