Enable cross-crate inlining without suggesting inlining

IIUC, if LTO is disabled, then cross-crate inlining is only an option to the compiler if a function is given the #[inline] attribute. However, that attribute also has a second function: it suggests that the compiler perform inlining. Is there any way to enable cross-crate inlining without giving the compiler a hint one way or the other? I just want it to be an option that the compiler can take advantage of, but I want to leave the decision of whether to do it or not up to the compiler.

4 Likes

It's my understanding that rust's #[inline] is what LLVM calls inlinehint, which still leaves it up to the compiler whether to actually inline the thing (as opposed to LLVM's alwaysinline).

What's the problem you're seeing, though? Why not use LTO to do this? EDIT: Or alternatively, what badness are you seeing from using #[inline] on things?

1 Like

I would suggest not disabling LTO if you want to keep the optimizations LTO provides. LTO wouldn't be needed if what your asking for existed.

In addition to #[inline] generic functions can be cross crate inlined. Also as scottmcm said it's still a suggestion it increases the chances of inlining. There's also #[inline(always)] which is a very strong weighting to inline but not strictly guaranteed. For instance recursive functions can't be completley inlined since they would be infinitely large then.

These are for crates that I'm publishing, and I want users who are not themselves using LTO to still be able to benefit from inlining, especially considering that many of the functions I'm providing are trivial (e.g., getters and such), so function call overhead is very high compared to their execution cost.

If it's a generic function (i.e. if there are one or more type parameters on the fn or the surrounding impl – lifetime parameters don’t count) then it can be inlined cross-crate without the #[inline] hint.

Otherwise, there is no separate attribute to enable inlining without also providing a hint. As I understand it, the hint only changes some thresholds in the optimizer. LLVM is generally very eager to inline small functions even without the hint, so the presence of the hint is unlikely to change its output when it comes to small functions like getters and setters. It might make a difference on fairly large functions, though.

With multiple codegen units (default since a long while back now), #[inline] has more effects as alexcrichton writes here: https://github.com/rust-lang/hashbrown/pull/119#issuecomment-537539046

No, #[inline] is very different than simply just an inline hint. As I mentioned before, there's no equivalent in C++ for what #[inline] does. In debug mode rustc basically ignores #[inline] , pretending you didn't even write it. In release mode the compiler will, by default, codegen an #[inline] function into every single referencing codegen unit , and then it will also add inlinehint . This means that if you have 16 CGUs and they all reference a hash map, every single one is getting the entire hash map implementation inlined into it.

This means it is a pretty forceful annotation and I think it makes sense to ask for something like #[inline(enable)] or #[inline(nohint)] that doesn't have such strong effects. You'd want a base attribute to just enable inlining, then on top of that one can add the inlinehint (stronger hint for inlining) and/or whether to emit the function in every codegen unit.

(This topic brings to mind - documentation - how do we share this inlining knowledge in the ecosystem - would it be feasible to include non-normative descriptions of what inlining can entail, in the reference? (Current Rust reference description of #[inline])

4 Likes

Gotcha, that's a shame.

@alexcrichton do you have a sense of whether folks might be amenable to a proposal like this? If so, I'd be happy to write one up.

I'm not sure how separable these things are. Doesn't it have to get emitted into the CGU for (non-LTO) LLVM to be able to inline it? (Since otherwise it doesn't know the implementation.)

I suppose MIR inlining adds even more wrinkles here...

2 Likes

There's two axes in theory of how rustc handles inline functions, assuming that the purpose of inlining is to avoid the need for LTO:

  • One is how rustc codegens functions into foreign crates. Currently #[inline] codegens it into all referencing codegen units. Another possibility is to codegen it into only one unit and have other units reference the original function from there.
  • The next axis is whether rustc applies the inlinehint annotation in LLVM, or how aggressively we tell LLVM to inline the function. Note that this attribute only applies to functions in the same CGU.

For the first option we have 3 options - don't codegen into downstream crates, codegen into one CGU of any referencing downstream crate, or codegen into all CGUs. For the second we have 2 options, either apply the attribute or not.

When only codegen'ing into one CGU we would generally rely on cross-CGU-inlining to happen with ThinLTO when a crate does its own passes of ThinLTO.

So, personally, I think there's possible room for

  • #[inline] - codegen into all CGUs, inlinehint
  • #[inline(probably)] - codegen into all CGUs, no inlinehint
  • #[inline(onecgu)] - codegen into a singular referencing CGU, inlinehint
  • #[inline(maybe)] - codegen into a singular referencing CGU, no inlinehint

Naturally the choice of naming here I've thought very hard about and is likely the final naming. More practically I'm not sure whether it's really all that useful to expose so many knobs. I don't think there's anything stopping rustc from doing it per se, but there may not be a ton of value gained from doing so.

One other possible optimization that rustc does not currently do today is to use the available_externally linkage in LLVM (AFAIK). This would allow rustc to inline the LLVM IR for a function, but if LLVM decides to not inline it then it doesn't actually codegen the function, instead it just references the symbol as an external symbol. This is highly applicable to the #[inline] attribute since you typically write it on otherwise-concrete functions (generics are already inlined all over the place), and by being a concrete function you know that the symbol could exist in an upstream crate (depending on visibility). This is only really an optimization for compile times, however, not for run times.

12 Likes

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.