Performance regression 1.77.0 -> 1.78.0?

tallinn1960 · May 28, 2024, 4:44pm

I noticed that by switching from rustc 1.77.0 to 1.78.0, some code I have apparently runs significantly slower (full optimised build) than before.

The code is in git@github.com:tallinn1960/rain_collected.git. It has a criterion benchmark, which shows that the function compute_rain_collected runs significantly slower when compiled by 1.78.0, a 176% regression.

compute_rain_collected_trap/compute_rain_collected
                        time:   [23.291 ms 23.338 ms 23.386 ms]
                        change: [+175.17% +176.07% +176.89%] (p = 0.00 < 0.05)
                        Performance has regressed.

Is this a criterion flaw or is it a compiler regression? All other tested functions in that benchmark perform the same with both compiler versions though.

This is on a Mac mini M1.

kornel · May 28, 2024, 5:10pm

Try cargo-show-asm to see if the generated code has changed. Maybe that will have a clue why.

Bruecki · May 28, 2024, 5:20pm

What target triple are you using?

x86 Compiler Explorer
aarch64 Compiler Explorer

For aarch64, the 1.78 version has fewer branches which should be beneficial but might be worse if the branch predictor does a good job for the use case.

scottmcm · May 28, 2024, 5:54pm

One thing you might do to make LLVM's life easier:

Rather than iterating over &i64,

github.com

tallinn1960/rain_collected/blob/77d0998aa977af7d5d76c2d2a6ceb04e978ad1ae/trap_rust/src/lib.rs#L62


      
          /// The solution is based on the observation that the terrain with the water trapped in
          /// it forms a stair going up to the maximum elevation of the terrain coming from the
          /// left and going down after that maximum to the right. So the goal is to calculate
          /// the size of the step corresponding to a given spot of the terrain left and right
          /// of the maximum elevation. The water collected on that spot is the difference
          /// between this stepsize and the actual elevation of the spot. If the maximum elevation
          /// is the last spot of the terrain, we can calculate the water collected by iterating
          /// the terrain from left to right and applying a fold operation that keeps track of the
          /// stepsize and the water collected.
          pub fn compute_rain_collected(height: &[i64]) -> u64 {
              let mut height = height.into_iter();
          
              std::iter::repeat(())
                  // We reorder the sequence of elevations by taking values
                  // from both ends of the terrain on a minimum first basis,
                  // advancing the iterator that points to the smaller value.
                  // This way we are guaranteed to have the maximum elevation
                  // as the last spot.
                  .scan((height.next(), height.next_back()), |state, _| {
                      if let (Some(left), Some(right)) = *state {
                          if left <= right {

instead do

let mut height = height.iter().copied();

(And correspondingly change |acc, &x| to |acc, x|.)

LLVM is much happier thinking about simple no-provenance no-races values like i64 rather than pointers. So I'd be curious if that changes things.

drewtato · May 28, 2024, 10:40pm

I tried this on x86_64-pc-windows-msvc and the same regression occurs. Nightly (2024-05-27) is just as bad, and 1.79 beta is in the middle.

1.77    443.545µs
1.78    872.203µs
beta    657.485µs
nightly 876.462µs

The order changes a lot depending on how many iterations occur.

This made 1.77, 1.78, and nightly about the same as previous beta, and beta now takes the same as previous nightly/1.78. In my experience copied sucks, but I'm always hopeful it'll be fixed someday.

I'm gonna turn this into a procedural loop next and see how that goes.

Edit: while loop is worse, but when the iterator is copied it's fast in everything except nightly!

pub fn compute_rain_collected(height: &[i64]) -> u64 {
    let mut height = height.iter().copied();

    let mut state = (height.next(), height.next_back());
    let mut acc = (i64::MIN, 0u64);
    while let (Some(left), Some(right)) = state {
        let x = if left <= right {
            state = (height.next(), Some(right));
            left
        } else {
            state = (Some(left), height.next_back());
            right
        };

        let stepsize = x.max(acc.0);
        acc = (stepsize, acc.1 + (stepsize - x) as u64);
    }
    acc.1
}

tallinn1960 · May 30, 2024, 4:11pm

Well, the code has changed, but I do not understand enough of arm64 code to understand why it got worse:

The default target triple for Rust on M1 is aarch64-apple-darwin.

tallinn1960 · May 30, 2024, 4:46pm

I mark this as a solution, as it restores the original speed of the solution on my machine.

Seems like 1.77.0 did that optimisation on its own, and 1.78.0 doesn't any longer? I wish I understood more of arm64 code.

Thanks for those responses.

scottmcm · May 30, 2024, 5:50pm

There's some interesting pointer provenance questions about slice iterators that are coming up now that LLVM is starting to try to actually fix a bunch of long-time bugs around it.

You might be interested in this zulip thread about a change in LLVM19 (https://rust-lang.zulipchat.com/#narrow/stream/187780-t-compiler.2Fwg-llvm/topic/pointer.20equality.20propagation/near/441362175) or this speculative PR about giving them only a single provenance (Make slice iterators carry only a single provenance by scottmcm · Pull Request #122971 · rust-lang/rust · GitHub).

system · August 28, 2024, 5:51pm

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.

Topic		Replies	Views
Another performance regression, 1.78.0 -> 1.79.0 help	10	846	September 22, 2024
Why rust 1.48.0 is slow? help	25	3428	March 1, 2021
Performance regression 1.78 -> 1.79 help	2	166	November 27, 2024
Are there any compilation time benchmarks of Rust vs. G++ vs. Clang++?	24	14771	July 3, 2022
Apparent 1.70.0 performance regression help	3	472	September 19, 2023

Performance regression 1.77.0 -> 1.78.0?

Related topics