Benchmarks game regressions


#1

The switch from rustc 1.18 to rustc 1.19.0 has shown some regressions in the benchmarks performance (also binarytrees, mandelbrot, regex and other benchnmarks regressed, so Rust has now moved behind C++ as third fastest language):
http://benchmarksgame.alioth.debian.org/u64q/rust.html

In particular this is about much slower (three times slower?):
http://benchmarksgame.alioth.debian.org/u64q/knucleotide.html


#2

That’s unfortunate. Are there any github tracking issues around this?


#3

Is there some way for us to automatically run these tests on each and every commit to quickly find out which commit(s) caused the regressions? My best guess is on LLVM regressions, rather than Rust itself. Running a test before the LLVM upgrade commit, and after, would find out if that’s the cause.


#4

Probably so. I missed this bit, https://github.com/rust-lang/rust/pull/42948, which is a change to LLVM used by Rust.


#5

There are also changes like https://github.com/rust-lang/rust/pull/43367 coming down the pike, and I’m not sure they won’t cause regressions either (there’s no discussion of performance testing in that issue :thinking:).


#6

I bet @Mark_Simulacrum has some thoughts on this, integrating these benchmarks into rust-perf maybe for tracking these going forward? For this current situation trying to figure it out after the fact, @Mark_Simulacrum can do better than running the benchmarks on every commit: https://github.com/Mark-Simulacrum/bisect-rust


#7

This was from the recent past:


#8

It’s not clear that it is a performance regression for knucleotide. Rather, four of the submissions no longer build, including the OrderMap submission which I think was the best performing (?). It looks like some inability to find the futures crate. Not sure what happened there.

http://benchmarksgame.alioth.debian.org/u64q/program.php?test=knucleotide&lang=rust&id=6

Edit: sorry, it says in the output what the problem is:

error[E0464]: multiple matching crates for `futures`

where it finds both

/usr/local/src/rust-libs/libfutures-1c1e2a4c26659c57.rlib
/usr/local/src/rust-libs/libfutures-17c2f0d76db207b9.rlib

shrug.


#9

I see. Perhaps it’s because he’s not using Cargo?


#10

There are a few open issues for benchmarking commits to rust-lang/rust with regards to performance. https://perf.rust-lang.org is intended for compiler benchmarks (i.e., compile-time performance) and https://github.com/rust-lang/rust/issues/31265 tracks adding runtime benchmarks (i.e., cargo bench) results to either perf.rlo or another similar site. While perf.rlo isn’t currently tracking timings correctly due to changes in how -Ztime-passes works and in the soft-deprecation thereof (i.e, it’s starting to be no longer representative of how the compiler works internally in the move towards a more incremental compiler), this should be fixed as soon as we work out a concrete plan for doing so.

My bisection tool works quite well for finding commits which regressed by introducing a new error or removing an error (i.e., changing stderr output to have / not have some string) but isn’t a great fit for “manual” bisection where the decision whether a commit is good/bad must be made by a human. I hope that can improve with time; there’s certainly potential for user contributions to make it work more closely to how tooling like git bisect works for the use cases that are assisted by that, and improvements in the ergonomics of using it, since it’s not currently ideal – the setup is somewhat non-trivial.