Move coding from a module to its submodule significantly downgrade performance of code

I'm encountering a weird situation. I have a very large module A which has lots of code. To make it cleaner, I moved some code from module A to its submodules A1, A2 and A3. However, this change significantly downgrade the performance of the code by 2-3%. The worse part is that the benchmark of some functions in module B (a sibling of A) regress by 10%. I compared the 2 branches multiple times and confirm that no other changes are introduced. This does not make any sense to me. I wonder has anyone met a similar situation before?

  1. Rust 1.63.0.
  2. Benchmark using criterion + criterion-perf-events which counts CPU cycles instead of wall/cpu time. It gives stable benchmark results and have been reporting reasonable benchmark results.
  3. Sorry that I couldn't share the code here.

I doubt this is "weird", rather it is to be expected. If you change anything, then you can expect significant changes in performance. For example, it may be that the code now has a different layout in memory and doesn't fit in some memory cache any more.

1 Like

It definitely sounds like code layout / icache problem. I heard that the PGO can solve this kind of issues.

1 Like

But why does it affect the performance of module B so much? I didn't do any change to module B.

A quick sanity check ... we are discussing a release build. Correct?

I suspect adjusting codegen-units will help. The combination of breaking up your code and a non-zero value for codegen-units means the optimizer has less visibility.

I'm using cargo bench which uses the bench profile. It inherits the release profile.

Thank you for bringing up the awesome point! I will give codegen-units a try to see whether a fixed codegen-units yields comparable performances.

Dump the assembly of the two versions. See hows they differ.

Profile both versions. See where the hot spots differ.

What else do you expect us to do for you since you can't show the code ?

If you've got 40 minutes, here's a good talk on code layout and related things, which Hyeonu mentioned as the likely culprit:

If you don't have 40 minutes, the short of it is that changing A can change B in ways that significantly affect its performance even if they're supposedly unrelated.