Help with cargo llvm-lines

Here is the output of cargo llvm-lines

  39196 (100%)  2255 (100%)  (TOTAL)
   5329 (13.6%)   87 (3.9%)  alloc::alloc::box_free
   4097 (10.5%)   85 (3.8%)  alloc::rc::Rc<T>::new
   1530 (3.9%)    85 (3.8%)  alloc::boxed::Box<T,A>::leak
    788 (2.0%)    87 (3.9%)  core::ptr::non_null::NonNull<T>::cast
    750 (1.9%)    90 (4.0%)  core::mem::manually_drop::ManuallyDrop<T>::new
    699 (1.8%)    87 (3.9%)  core::ptr::unique::Unique<T>::cast
    680 (1.7%)    85 (3.8%)  <core::ptr::non_null::NonNull<T> as core::convert::From<&mut T>>::from
    623 (1.6%)    88 (3.9%)  core::ptr::non_null::NonNull<T>::as_ref
    526 (1.3%)    87 (3.9%)  core::ptr::unique::Unique<T>::as_ref
    524 (1.3%)     4 (0.2%)  hashbrown::map::HashMap<K,V,S,A>::insert
    516 (1.3%)    86 (3.8%)  core::ptr::unique::Unique<T>::as_ptr
    511 (1.3%)    73 (3.2%)  alloc::rc::Rc<T>::from_inner
    510 (1.3%)    85 (3.8%)  <T as core::convert::Into<U>>::into
    480 (1.2%)    94 (4.2%)  core::ptr::non_null::NonNull<T>::as_ptr
    476 (1.2%)     4 (0.2%)  hashbrown::raw::RawTable<T,A>::insert
    431 (1.1%)    17 (0.8%)  core::ptr::read
    425 (1.1%)    85 (3.8%)  <core::mem::manually_drop::ManuallyDrop<T> as core::ops::deref::Deref>::deref
    392 (1.0%)     1 (0.0%)  xos_main::xos_app_util::Xos_App_Util::gen__gfx_dom

here, I think xos_main::xos_app_util::... is the first line that is code I am directly writing. Given the other code is "out of my control", is trying to optimize this hopeless? I don't understand what's happening with all the Rc / Box.

Pointing out the obvious, it means you use a lot of Rcs, NonNulls, and Boxes.

The compiler monomorphises generic structs/functions, so if you monomorphised Rc<T>::new() is accounting for a large amount of LLVM IR that means,

  1. You are calling Rc::new() with lots of different types - this often accidentally happens when you use Rc in generics so maybe using trait objects instead of generics will reduce the amount of monomorphised copies your binary has, or
  2. Your program contains so little code that seemingly trivial functions like Rc::new() and box_free take up a large amount of codegen time

Don't forget that those copies of Rc::new() and friends only exist because your code calls them at some point (possibly via a transitive dependency). So it's completely within you control.

  -----         ------       -------------
  39196 (100%)  2255 (100%)  (TOTAL)
   5329 (13.6%)   87 (3.9%)  alloc::alloc::box_free
   4097 (10.5%)   85 (3.8%)  alloc::rc::Rc<T>::new
   1530 (3.9%)    85 (3.8%)  alloc::boxed::Box<T,A>::leak
    788 (2.0%)    87 (3.9%)  core::ptr::non_null::NonNull<T>::cast

It's not actually that much, is it? It's saying that I'm calling Rc<T> with 85 different values for T? I can believe that.

  Lines         Copies      Function name
  -----         ------      -------------
  20906 (100%)  998 (100%)  (TOTAL)
    524 (2.5%)    4 (0.4%)  hashbrown::map::HashMap<K,V,S,A>::insert
    476 (2.3%)    4 (0.4%)  hashbrown::raw::RawTable<T,A>::insert
    431 (2.1%)   17 (1.7%)  core::ptr::read
    380 (1.8%)    4 (0.4%)  <hashbrown::map::HashMap<K,V,S,A> as core::iter::traits::collect::Extend<(K,V)>>::extend

Turns out you're right, I can control how much Rc is used. (In fact, close to 0 now I think). Nevertheless, after reducing llvm-lines from 40k to 20k, compile time only went from 2.5s to 2.1s for the crate. (this is only release mode, opt-level = 3)

Is there anything besides llvm-lines worth looking into? (while keeping opt-level = 3)