Enable thin-local lto or codegen-units to 1, Which Runtime Performs Better

The rustc manual says that codegen-units=1 disables thin-local lto.

lto

  • When -C lto is not specified:
    • codegen-units=1: disable LTO.
    • opt-level=0: disable LTO.

codegen-units

This flag controls the maximum number of code generation units the crate is split into. It takes an integer greater than 0.

When a crate is split into multiple codegen units, LLVM is able to process them in parallel. Increasing parallelism may speed up compile times, but may also produce slower code. Setting this to 1 may improve the performance of generated code, but may be slower to compile.

The default value, if not specified, is 16 for non-incremental builds. For incremental builds the default is 256 which allows caching to be more granular.

So Enable thin-local lto or set codegen-units to 1, which of the two methods can generate programs with better performance?

You have to distinguish between LTO that happens when combining multiple codegen units of a single crate into one, and LTO that happens when linking separate crates together.

The first kind of LTO only makes sense if you have more than one codegen unit, since otherwise there is nothing to link. You can only perform link-time-optimization when you link something.

The second kind of LTO can be used even if codegen units is 1, because it happens when linking separate crates, rather than during the compilation of a single crate. It's enabled by passing -C lto.

2 Likes