Inlining non-generic call chains cross crates

Lets' say I have the following code:

pub fn small_foo() {
  small_bar()
}

fn small_bar() {
   // ...
}

The way rustc works today, if I want to enable inlining of small_foo across the crates, I need to at least add #[inline] pub fn small_foo(). Question: do I need #[inline] pub fn small_bar() as well?

From my fuzzy recollections of various profiling session, I believe that the answer is generally "yes". Is the definitive answer documented somewhere? Are there any ready benchmarks I can run myself?

3 Likes

A quick test shows that for truly trivial cases (like a + b trivial), you get a full inline.

I was getting some strange output from cargo asm, though, so I don't know how much I trust it.

My guess from observations: one of two cases are true; in addition to the LLVM inlinehint, either

  • #[inline] gives you access to the MIR of the annotated function, but no more, meaning calls to unannotated functions are LLVM inlining barriers; you get MIR optimization but not LLVM; MIR inlining is working for some truly trivial cases; or
  • #[inline] gets you the full locally optimized version of the function, after LLVM inlining; this only considers locally known information in the subcrate; knowledge about the arguments cannot be used to drive further inlining through non-#[inline] calls, even if the knowledge would change inlining decisions.

I, of course, have no actual knowledge of what's going on, and this is just an educated guess that I had fun constructing.

1 Like

This makes sense because nested inlining by default could lead to explosions in binary size, which would ultimately be an undesirable feature unless for the trivial cases

Experimented myself:

The answer appears to be "you need both #[inline]s" :frowning:

1 Like