Can I improve this LLVM_module_codegen_emit_obj phase that is getting the 60% of total time?

I'm trying to understand how to reduce compile times at least on warm build (incremental: after one change in the code).

I ran:

cargo +nightly rustc -- -Z self-profile

and then:

summarize summarize .\prj-0007408.mm_profdata > summarize.txt

the result is:

+-------------------------------------------------------------------------+-----------+-----------------+----------+------------+---------------------------------+
| Item                                                                    | Self time | % of total time | Time     | Item count | Incremental result hashing time |
+-------------------------------------------------------------------------+-----------+-----------------+----------+------------+---------------------------------+
| LLVM_module_codegen_emit_obj                                            | 25.37s    | 59.193          | 25.37s   | 257        | 0.00ns                          |
+-------------------------------------------------------------------------+-----------+-----------------+----------+------------+---------------------------------+
| LLVM_passes                                                             | 4.75s     | 11.087          | 4.75s    | 1          | 0.00ns                          |
+-------------------------------------------------------------------------+-----------+-----------------+----------+------------+---------------------------------+
| codegen_module                                                          | 3.22s     | 7.522           | 4.61s    | 256        | 0.00ns                          |
+-------------------------------------------------------------------------+-----------+-----------------+----------+------------+---------------------------------+
| run_linker                                                              | 1.39s     | 3.234           | 1.39s    | 1          | 0.00ns                          |
+-------------------------------------------------------------------------+-----------+-----------------+----------+------------+---------------------------------+
| LLVM_module_optimize                                                    | 1.39s     | 3.231           | 1.39s    | 257        | 0.00ns                          |
+-------------------------------------------------------------------------+-----------+-----------------+----------+------------+---------------------------------+
| codegen_select_candidate                                                | 794.33ms  | 1.853           | 1.06s    | 16696      | 10.49ms                         |
+-------------------------------------------------------------------------+-----------+-----------------+----------+------------+---------------------------------+
| codegen_crate                                                           | 640.28ms  | 1.494           | 8.57s    | 1          | 0.00ns                          |
+-------------------------------------------------------------------------+-----------+-----------------+----------+------------+---------------------------------+
| LLVM_module_codegen                                                     | 528.54ms  | 1.233           | 25.90s   | 257        | 0.00ns                          |
+-------------------------------------------------------------------------+-----------+-----------------+----------+------------+---------------------------------+
| normalize_canonicalized_projection_ty                                   | 460.66ms  | 1.075           | 478.70ms | 5047       | 4.78ms                          |
+-------------------------------------------------------------------------+-----------+-----------------+----------+------------+---------------------------------+
| fn_abi_of_instance                                                      | 436.27ms  | 1.018           | 1.09s    | 44387      | 81.20ms                         |
+-------------------------------------------------------------------------+-----------+-----------------+----------+------------+---------------------------------+
| monomorphization_collector_graph_walk                                   | 389.12ms  | 0.908           | 2.87s    | 1          | 0.00ns                          |
+-------------------------------------------------------------------------+-----------+-----------------+----------+------------+---------------------------------+
| symbol_name                                                             | 299.49ms  | 0.699           | 351.51ms | 41811      | 17.27ms                         |
+-------------------------------------------------------------------------+-----------+-----------------+----------+------------+---------------------------------+
| evaluate_obligation                                                     | 286.66ms  | 0.669           | 439.14ms | 23606      | 8.91ms                          |
+-------------------------------------------------------------------------+-----------+-----------------+----------+------------+---------------------------------+
| layout_of                                                               | 249.84ms  | 0.583           | 1.46s    | 89521      | 57.78ms                         |
+-------------------------------------------------------------------------+-----------+-----------------+----------+------------+---------------------------------+
| incr_comp_encode_dep_graph                                              | 185.46ms  | 0.433           | 185.46ms | 702136     | 0.00ns                          |
+-------------------------------------------------------------------------+-----------+-----------------+----------+------------+---------------------------------+
| mir_shims                                                               | 165.46ms  | 0.386           | 371.33ms | 5515       | 35.78ms                         |
+-------------------------------------------------------------------------+-----------+-----------------+----------+------------+---------------------------------+
| metadata_decode_entry_optimized_mir                                     | 165.18ms  | 0.385           | 165.18ms | 4995       | 0.00ns                          |
+-------------------------------------------------------------------------+-----------+-----------------+----------+------------+---------------------------------+
| free_global_ctxt                                                        | 131.65ms  | 0.307           | 131.74ms | 1          | 0.00ns                          |
+-------------------------------------------------------------------------+-----------+-----------------+----------+------------+---------------------------------+

... others

Total cpu time: 42.8644998s
+----------------------------+----------------+
| Item                       | Artifact Size  |
+----------------------------+----------------+
| cgu_instructions           | 906183 bytes   |
+----------------------------+----------------+
| codegen_unit_size_estimate | 792989 bytes   |
+----------------------------+----------------+
| dep_graph                  | 35603902 bytes |
+----------------------------+----------------+
| linked_artifact            | 36443136 bytes |
+----------------------------+----------------+
| object_file                | 39676471 bytes |
+----------------------------+----------------+
| query_cache                | 10406765 bytes |
+----------------------------+----------------+
| work_product_index         | 14780 bytes    |
+----------------------------+----------------+

and then:

cargo llvm-lines

and the result (only the top ones):

  Lines                 Copies               Function name
  -----                 ------               -------------
  948075                38104                (TOTAL)
   40764 (4.3%,  4.3%)    215 (0.6%,  0.6%)  async_graphql::resolver_utils::container::Fields::add_set::{{closure}}
   15855 (1.7%,  6.0%)    735 (1.9%,  2.5%)  core::result::Result<T,E>::map_err
   13524 (1.4%,  7.4%)     84 (0.2%,  2.7%)  alloc::raw_vec::RawVec<T,A>::grow_amortized
   13442 (1.4%,  8.8%)     48 (0.1%,  2.8%)  <futures_util::stream::futures_unordered::FuturesUnordered<Fut> as futures_core::stream::Stream>::poll_next
   12384 (1.3%, 10.1%)     43 (0.1%,  3.0%)  async_graphql::resolver_utils::container::resolve_container_inner::{{closure}}
   10425 (1.1%, 11.2%)     75 (0.2%,  3.1%)  alloc::raw_vec::RawVec<T,A>::try_allocate_in
   10191 (1.1%, 12.3%)     43 (0.1%,  3.3%)  async_graphql::resolver_utils::container::Fields::add_set
    9744 (1.0%, 13.3%)     48 (0.1%,  3.4%)  alloc::raw_vec::RawVec<T,A>::shrink
    8464 (0.9%, 14.2%)   2116 (5.6%,  8.9%)  core::pin::Pin<Ptr>::new_unchecked
    8304 (0.9%, 15.1%)    162 (0.4%,  9.4%)  <alloc::sync::Weak<T,A> as core::ops::drop::Drop>::drop
    8160 (0.9%, 16.0%)     48 (0.1%,  9.5%)  <futures_util::future::try_join_all::TryJoinAll<F> as core::future::future::Future>::poll
    7912 (0.8%, 16.8%)     23 (0.1%,  9.6%)  async_graphql::resolver_utils::list::resolve_list::{{closure}}
    7452 (0.8%, 17.6%)     46 (0.1%,  9.7%)  async_graphql::resolver_utils::list::resolve_list::{{closure}}::{{closure}}
    7440 (0.8%, 18.4%)     48 (0.1%,  9.8%)  <futures_util::stream::futures_ordered::FuturesOrdered<Fut> as futures_core::stream::Stream>::poll_next
    7031 (0.7%, 19.1%)    293 (0.8%, 10.6%)  core::option::Option<T>::map
    6673 (0.7%, 19.8%)     48 (0.1%, 10.7%)  futures_util::stream::futures_unordered::FuturesUnordered<Fut>::new
    6050 (0.6%, 20.4%)     50 (0.1%, 10.8%)  alloc::vec::in_place_collect::from_iter_in_place
    5904 (0.6%, 21.1%)     48 (0.1%, 11.0%)  futures_util::stream::futures_unordered::ready_to_run_queue::ReadyToRunQueue<Fut>::dequeue
    5879 (0.6%, 21.7%)    288 (0.8%, 11.7%)  <core::result::Result<T,E> as core::ops::try_trait::Try>::branch
    5856 (0.6%, 22.3%)    139 (0.4%, 12.1%)  core::option::Option<T>::map_or
    5633 (0.6%, 22.9%)    215 (0.6%, 12.6%)  async_graphql::resolver_utils::container::Fields::add_set::{{closure}}::{{closure}}
    5473 (0.6%, 23.5%)     48 (0.1%, 12.8%)  futures_util::stream::futures_unordered::FuturesUnordered<Fut>::release_task
    5280 (0.6%, 24.0%)     48 (0.1%, 12.9%)  futures_util::stream::futures_unordered::FuturesUnordered<Fut>::unlink
    5232 (0.6%, 24.6%)     48 (0.1%, 13.0%)  futures_util::stream::futures_unordered::FuturesUnordered<Fut>::link
    5169 (0.5%, 25.1%)     83 (0.2%, 13.2%)  <alloc::vec::Vec<T> as alloc::vec::spec_from_iter_nested::SpecFromIterNested<T,I>>::from_iter
    5014 (0.5%, 25.7%)    162 (0.4%, 13.7%)  <alloc::boxed::Box<T,A> as core::ops::drop::Drop>::drop
    4881 (0.5%, 26.2%)     64 (0.2%, 13.8%)  core::iter::traits::iterator::Iterator::try_fold
    4790 (0.5%, 26.7%)     63 (0.2%, 14.0%)  <&T as async_graphql::base::OutputType>::resolve::{{closure}}
    4650 (0.5%, 27.2%)     75 (0.2%, 14.2%)  alloc::raw_vec::handle_reserve
    4628 (0.5%, 27.7%)     67 (0.2%, 14.4%)  alloc::vec::Vec<T,A>::extend_trusted
    4536 (0.5%, 28.1%)     24 (0.1%, 14.4%)  <async_graphql::types::merged_object::MergedObject<A,B> as async_graphql::resolver_utils::container::ContainerType>::resolve_field::{{closure}}
    4128 (0.4%, 28.6%)     48 (0.1%, 14.6%)  futures_util::stream::futures_unordered::FuturesUnordered<Fut>::push
    4089 (0.4%, 29.0%)    192 (0.5%, 15.1%)  core::iter::adapters::map::map_fold::{{closure}}
    4039 (0.4%, 29.4%)     25 (0.1%, 15.1%)  async_graphql::context::ContextBase<T>::get_param_value
    3968 (0.4%, 29.8%)    128 (0.3%, 15.5%)  alloc::raw_vec::RawVec<T,A>::current_memory
    3892 (0.4%, 30.3%)     99 (0.3%, 15.7%)  core::iter::adapters::map::map_try_fold::{{closure}}
    3504 (0.4%, 30.6%)     48 (0.1%, 15.8%)  <core::slice::iter::IterMut<T> as core::iter::traits::iterator::Iterator>::fold
    3390 (0.4%, 31.0%)     79 (0.2%, 16.1%)  core::iter::traits::iterator::Iterator::fold
    3386 (0.4%, 31.3%)    484 (1.3%, 17.3%)  alloc::boxed::Box<T>::new

First question: what is doing cargo +nightly rustc -- -Z self-profile? Is it like cargo build without --release? Or what?

Second question: as you can see LLVM_module_codegen_emit_obj gets 60% of total time. What is it? Can I improve it?

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.