Another performance regression, 1.78.0 -> 1.79.0

On some code for solving AoC2023 Day9, updating the rust compiler on Ubuntu 24.04 from 1.78.0 to 1.79.0 showed drastic performance regression of code generated for the solution on Linux:

tallinn@ubuntu-24-04:~/Development/day9$ cargo +1.78.0 bench rust
...
criterion/p1-rust       time:   [90.466 µs 90.645 µs 90.844 µs]
                        change: [-0.0408% +0.3089% +0.7077%] (p = 0.09 > 0.05)
                        No change in performance detected.
Found 6 outliers among 100 measurements (6.00%)
  5 (5.00%) high mild
  1 (1.00%) high severe

criterion/p2-rust       time:   [96.280 µs 96.485 µs 96.715 µs]
                        change: [+0.9550% +1.3432% +1.7217%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 8 outliers among 100 measurements (8.00%)
  6 (6.00%) high mild
  2 (2.00%) high severe
...
Timer precision: 41 ns
divan        fastest       │ slowest       │ median        │ mean          │ samples │ iters
╰─ rust                    │               │               │               │         │
   ├─ part1  93.58 µs      │ 146.8 µs      │ 98.39 µs      │ 102 µs        │ 100     │ 100
   ╰─ part2  99.25 µs      │ 258.6 µs      │ 105.6 µs      │ 113 µs        │ 100     │ 100

tallinn@ubuntu-24-04:~/Development/day9$ cargo bench rust
...
criterion/p1-rust       time:   [151.73 µs 153.45 µs 155.71 µs]
                        change: [+67.935% +69.612% +71.514%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 9 outliers among 100 measurements (9.00%)
  4 (4.00%) high mild
  5 (5.00%) high severe

criterion/p2-rust       time:   [164.36 µs 165.13 µs 166.01 µs]
                        change: [+71.578% +72.535% +73.486%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild
...
Timer precision: 41 ns
divan        fastest       │ slowest       │ median        │ mean          │ samples │ iters
╰─ rust                    │               │               │               │         │
   ├─ part1  151.1 µs      │ 504.4 µs      │ 157 µs        │ 166.3 µs      │ 100     │ 100
   ╰─ part2  161.5 µs      │ 524.5 µs      │ 168.4 µs      │ 175.5 µs      │ 100     │ 100

The regression is not so bad on macOS though, but still noticable (at least with criterion):

tallinn@Mac-mini-M1 day9 % cargo +1.78.0 bench rust 
...
criterion/p1-rust       time:   [108.40 µs 109.65 µs 111.06 µs]

criterion/p2-rust       time:   [107.86 µs 109.02 µs 110.30 µs]
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild
...
Timer precision: 41 ns
divan        fastest       │ slowest       │ median        │ mean          │ samples │ iters
╰─ rust                    │               │               │               │         │
   ├─ part1  104.1 µs      │ 177.7 µs      │ 123.2 µs      │ 121.2 µs      │ 100     │ 100
   ╰─ part2  107.9 µs      │ 157.9 µs      │ 114.1 µs      │ 118.1 µs      │ 100     │ 100

tallinn@Mac-mini-M1 day9 % cargo bench rust 
...
criterion/p1-rust       time:   [116.30 µs 118.43 µs 121.21 µs]
                        change: [+5.4095% +7.8137% +10.814%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 4 outliers among 100 measurements (4.00%)
  2 (2.00%) high mild
  2 (2.00%) high severe

criterion/p2-rust       time:   [116.86 µs 117.68 µs 118.55 µs]
                        change: [+6.4280% +7.7616% +9.1637%] (p = 0.00 < 0.05)
                        Performance has regressed.
...
Timer precision: 41 ns
divan        fastest       │ slowest       │ median        │ mean          │ samples │ iters
╰─ rust                    │               │               │               │         │
   ├─ part1  111.8 µs      │ 243.3 µs      │ 118.2 µs      │ 125.9 µs      │ 100     │ 100
   ╰─ part2  113 µs        │ 186.6 µs      │ 119.3 µs      │ 119.8 µs      │ 100     │ 100

The code is at https://github.com/tallinn1960/day9.

Why is that? Last time when I noticed another performance regression updating from 1.77.0 to 1.78.0, a code change was suggested here that restored the performance of my code. Is there something here to achieve the same? Or is this a compiler regression bug?

This is on a Mac mini M1 with Ubuntu 24.04 Linux running in a Parallels VM.

Have you run it in a profiler? Which part got slower?

1 Like

Why are you asking? The profiler output is all there.

I noticed that the regression isn't that bad, when I manually #[inline] some functions. That wasn't necessary with 1.78.0.

But it's still about 10% regression on Linux, but with inlining on macOS the regression vanishes. But not my surprise that the same Rust code is slower on macOS than on Linux on the same machine.

Did I miss it somewhere? I see benchmark output, but not something like a profiler flamegraph.

7 Likes

Can you minimize the example? In the repo I see multiple crates and Swift involved too.

Having a single file/function would help analyze assembly output and LLVM IR.

6 Likes

Will do tomorrow. I will create a branch that gets rid of all the swift stuff and downsize the source code to a minimum. I will post here when done.

Got it faster done that I expected. There is a branch "regression" now, that features a minimized version of the benchmarked rust code.

3 Likes

By isolating the piece of code affected by the regression (without the other crates), the bench gives what results? :thinking:

About 10% regression for each part. If I mark the two ínner functions of the loop as #inline, regression drops down to 4%.

Can you isolate the code from its functions and put a POC on the playground so that we can inspect with you? :pray: