The way rustc works today, if I want to enable inlining of small_foo across the crates, I need to at least add #[inline] pub fn small_foo(). Question: do I need #[inline] pub fn small_bar() as well?
From my fuzzy recollections of various profiling session, I believe that the answer is generally "yes". Is the definitive answer documented somewhere? Are there any ready benchmarks I can run myself?
A quick test shows that for truly trivial cases (like a + b trivial), you get a full inline.
I was getting some strange output from cargo asm, though, so I don't know how much I trust it.
My guess from observations: one of two cases are true; in addition to the LLVM inlinehint, either
#[inline] gives you access to the MIR of the annotated function, but no more, meaning calls to unannotated functions are LLVM inlining barriers; you get MIR optimization but not LLVM; MIR inlining is working for some truly trivial cases; or
#[inline] gets you the full locally optimized version of the function, after LLVM inlining; this only considers locally known information in the subcrate; knowledge about the arguments cannot be used to drive further inlining through non-#[inline] calls, even if the knowledge would change inlining decisions.
I, of course, have no actual knowledge of what's going on, and this is just an educated guess that I had fun constructing.
This makes sense because nested inlining by default could lead to explosions in binary size, which would ultimately be an undesirable feature unless for the trivial cases