Is there an established way how to report performance regressions introduced by applying Profile Guided Optimization (PGO) to a Rust program?
PGO optimizes a program based on the collected by a sample workload profile. So at first, it could just be a sample workload issue - here we can do nothing but use a more "realistic" workload.
But there are situations when performance is regressed after applying PGO on the same workload that was used as a sample workload. One of the examples is QOA Audio format Rust library (GitHub - rafaelcaricio/qoaudio: Quite OK Audio format in Rust.) - got a regression near 16% or simd-json (GitHub - simd-lite/simd-json: Rust port of simdjson) - got a regression near 20-30%. This library has dedicated benches, and after optimizing with PGO with cargo-pgo (by cargo pgo instrument bench + cargo pgo optimize bench) I got a regression compared to just cargo bench. So my guess here is it could be a compiler bug/missing opportunity.
Do we need to report it to Rustc project directly? I've checked the issues regarding PGO in the Rustc repository and didn't find an example of any. Some piece of advice is needed here.
@Kobzol gently pinging you here since you are a PGO expert in Rustc.
Hi, I wouldn't really say that I'm a PGO expert In general, PGO is an optimization, and it's not guaranteed that it will always produce faster code, same as e.g. compiling with -O3 can be slower than compiling with -O2 in C/C++ land. That being said, missing optimizations can definitely be reported as issues/bugs.
I would perhaps first try to perform PGO on some workload outside of (micro)benchmarks, as these might not be as representative, and I don't know how well it works to actually run PGO on them. It could also be some bug/issue with cargo-pgo
Another thing is that microbenchmarks can be quite treacherous, and it's not uncommon that you can get a double digit % regression when you run the exact same code twice in a row. I'm not sure how much you were trying to measure the inherent background noise, but this is also something to consider.
Yep, I understand that PGO does not guarantee to optimize further after -O2/-O3 optimization levels. But from my expectations, if I used the same workload for training and benchmarking phases - the results should be at least the same, ideally - better with PGO. So I expect with on the same workload PGO version is slower - it's an optimizer bug and should be reported to the upstream (Rustc), and then, probably, to the LLVM.
I'm not sure how much you were trying to measure the inherent background noise, but this is also something to consider.
Usually, I mitigate these things with Criterion and multiple runs, so background noise is not a problem here.
I would perhaps first try to perform PGO on some workload outside of (micro)benchmarks, as these might not be as representative, and I don't know how well it works to actually run PGO on them. It could also be some bug/issue with cargo-pgo
So it seems to do something on my PC (using rustc 1.71.0).
That's strange. I reproduced your results on my Linux machine (AMD 5900x, Linux 6.4, Fedora 38), but on macOS (M1 Pro, Ventura 13.4) I still get the same performance decrease - Profile-Guided Optimization (PGO) results · Issue #5 · rafaelcaricio/qoaudio · GitHub. Don't know what is the reason for that - probably a good candidate for "Wrong PGO optimizer decision on macOS" issue
Hi, I wouldn't really say that I'm a PGO expert
Huh, don't be shy - I've seen all your PRs about PGO/BOLT to Rustc