There's two axes in theory of how rustc handles inline functions, assuming that the purpose of inlining is to avoid the need for LTO:
- One is how rustc codegens functions into foreign crates. Currently
#[inline]codegens it into all referencing codegen units. Another possibility is to codegen it into only one unit and have other units reference the original function from there. - The next axis is whether rustc applies the
inlinehintannotation in LLVM, or how aggressively we tell LLVM to inline the function. Note that this attribute only applies to functions in the same CGU.
For the first option we have 3 options - don't codegen into downstream crates, codegen into one CGU of any referencing downstream crate, or codegen into all CGUs. For the second we have 2 options, either apply the attribute or not.
When only codegen'ing into one CGU we would generally rely on cross-CGU-inlining to happen with ThinLTO when a crate does its own passes of ThinLTO.
So, personally, I think there's possible room for
-
#[inline]- codegen into all CGUs,inlinehint -
#[inline(probably)]- codegen into all CGUs, noinlinehint -
#[inline(onecgu)]- codegen into a singular referencing CGU,inlinehint -
#[inline(maybe)]- codegen into a singular referencing CGU, noinlinehint
Naturally the choice of naming here I've thought very hard about and is likely the final naming. More practically I'm not sure whether it's really all that useful to expose so many knobs. I don't think there's anything stopping rustc from doing it per se, but there may not be a ton of value gained from doing so.
One other possible optimization that rustc does not currently do today is to use the available_externally linkage in LLVM (AFAIK). This would allow rustc to inline the LLVM IR for a function, but if LLVM decides to not inline it then it doesn't actually codegen the function, instead it just references the symbol as an external symbol. This is highly applicable to the #[inline] attribute since you typically write it on otherwise-concrete functions (generics are already inlined all over the place), and by being a concrete function you know that the symbol could exist in an upstream crate (depending on visibility). This is only really an optimization for compile times, however, not for run times.