Dramatic Increase in Compile Time with Fat LTO in Release Build: Causes and Troubleshooting?

We're working on a Rust project and have noticed a significant difference in compile times between the debug and release builds. When compiling to a debug binary, it completes in roughly 6 minutes. However, when we compile for a release build using fat LTO, the compile time dramatically increases to approximately 90 minutes. Notably, the LTO process alone accounts for more than 1 hour of this duration.

What potential reasons might be causing this disparity in compile times? Are there any recommended methods or tools for troubleshooting this issue?

Fat LTO tends to have this effect. When we build python (C) with LTO enabled on an embeded platform it builds for several hours, as opposed to 20 minutes or so without LTO.

I believe it's in the nature of fat LTO rather than a problem to be solved.

Thank you for sharing your experience with LTO and Python (C) builds. I understand that fat LTO can naturally lead to longer compile times. However, in our other projects, the discrepancy between debug and LTO release compile times wasn't as pronounced. I'm keen to identify factors that could amplify this gap so that we might adjust our codebase to mitigate such extensive compile times.

I would suggest you experiment with using different linkers, though I'm not sure about LTO maturity with different linker alternatives out there. See Configuration - The Cargo Book

In my experience, that's just fat LTO tends to be.

If you want most of the runtime perf benefit of Fat LTO (and in some cases actually better runtime perf) with a much smaller compile time hit you should use ThinLTO instead. Fat LTO effectively merges all code into a single LLVM module and then optimizes and codegens it as a single unit, which leads to getting hit by some optimizations being quadratic in input size and only allows using a single cpu core. ThinLTO on the other hand keeps all codegen units separate with separate parallel optimizing and instead only has a single serial pass where it collects merges summaries of all codegen units to be used to optimize all other codegen units. These summaries can for example contain the llvm ir of candidates for inlining and other information that helps optimizations. ThinLTO is capable of using all cpu cores for optimization, thus making it much faster.

5 Likes

Compiler optimizations are computationally complex - that's why most compilers by default will only apply them to code on a per-object basis. LTO will apply optimizations to your entire codebase.
You should expect that the increase in time will be superlinear. 90 minutes doesn't seem to bad to me, at least it finishes without OOMing

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.