I'm trying to understand how to reduce compile times at least on warm build (incremental: after one change in the code).
I ran:
cargo +nightly rustc -- -Z self-profile
and then:
summarize summarize .\prj-0007408.mm_profdata > summarize.txt
the result is:
+-------------------------------------------------------------------------+-----------+-----------------+----------+------------+---------------------------------+
| Item | Self time | % of total time | Time | Item count | Incremental result hashing time |
+-------------------------------------------------------------------------+-----------+-----------------+----------+------------+---------------------------------+
| LLVM_module_codegen_emit_obj | 25.37s | 59.193 | 25.37s | 257 | 0.00ns |
+-------------------------------------------------------------------------+-----------+-----------------+----------+------------+---------------------------------+
| LLVM_passes | 4.75s | 11.087 | 4.75s | 1 | 0.00ns |
+-------------------------------------------------------------------------+-----------+-----------------+----------+------------+---------------------------------+
| codegen_module | 3.22s | 7.522 | 4.61s | 256 | 0.00ns |
+-------------------------------------------------------------------------+-----------+-----------------+----------+------------+---------------------------------+
| run_linker | 1.39s | 3.234 | 1.39s | 1 | 0.00ns |
+-------------------------------------------------------------------------+-----------+-----------------+----------+------------+---------------------------------+
| LLVM_module_optimize | 1.39s | 3.231 | 1.39s | 257 | 0.00ns |
+-------------------------------------------------------------------------+-----------+-----------------+----------+------------+---------------------------------+
| codegen_select_candidate | 794.33ms | 1.853 | 1.06s | 16696 | 10.49ms |
+-------------------------------------------------------------------------+-----------+-----------------+----------+------------+---------------------------------+
| codegen_crate | 640.28ms | 1.494 | 8.57s | 1 | 0.00ns |
+-------------------------------------------------------------------------+-----------+-----------------+----------+------------+---------------------------------+
| LLVM_module_codegen | 528.54ms | 1.233 | 25.90s | 257 | 0.00ns |
+-------------------------------------------------------------------------+-----------+-----------------+----------+------------+---------------------------------+
| normalize_canonicalized_projection_ty | 460.66ms | 1.075 | 478.70ms | 5047 | 4.78ms |
+-------------------------------------------------------------------------+-----------+-----------------+----------+------------+---------------------------------+
| fn_abi_of_instance | 436.27ms | 1.018 | 1.09s | 44387 | 81.20ms |
+-------------------------------------------------------------------------+-----------+-----------------+----------+------------+---------------------------------+
| monomorphization_collector_graph_walk | 389.12ms | 0.908 | 2.87s | 1 | 0.00ns |
+-------------------------------------------------------------------------+-----------+-----------------+----------+------------+---------------------------------+
| symbol_name | 299.49ms | 0.699 | 351.51ms | 41811 | 17.27ms |
+-------------------------------------------------------------------------+-----------+-----------------+----------+------------+---------------------------------+
| evaluate_obligation | 286.66ms | 0.669 | 439.14ms | 23606 | 8.91ms |
+-------------------------------------------------------------------------+-----------+-----------------+----------+------------+---------------------------------+
| layout_of | 249.84ms | 0.583 | 1.46s | 89521 | 57.78ms |
+-------------------------------------------------------------------------+-----------+-----------------+----------+------------+---------------------------------+
| incr_comp_encode_dep_graph | 185.46ms | 0.433 | 185.46ms | 702136 | 0.00ns |
+-------------------------------------------------------------------------+-----------+-----------------+----------+------------+---------------------------------+
| mir_shims | 165.46ms | 0.386 | 371.33ms | 5515 | 35.78ms |
+-------------------------------------------------------------------------+-----------+-----------------+----------+------------+---------------------------------+
| metadata_decode_entry_optimized_mir | 165.18ms | 0.385 | 165.18ms | 4995 | 0.00ns |
+-------------------------------------------------------------------------+-----------+-----------------+----------+------------+---------------------------------+
| free_global_ctxt | 131.65ms | 0.307 | 131.74ms | 1 | 0.00ns |
+-------------------------------------------------------------------------+-----------+-----------------+----------+------------+---------------------------------+
... others
Total cpu time: 42.8644998s
+----------------------------+----------------+
| Item | Artifact Size |
+----------------------------+----------------+
| cgu_instructions | 906183 bytes |
+----------------------------+----------------+
| codegen_unit_size_estimate | 792989 bytes |
+----------------------------+----------------+
| dep_graph | 35603902 bytes |
+----------------------------+----------------+
| linked_artifact | 36443136 bytes |
+----------------------------+----------------+
| object_file | 39676471 bytes |
+----------------------------+----------------+
| query_cache | 10406765 bytes |
+----------------------------+----------------+
| work_product_index | 14780 bytes |
+----------------------------+----------------+
and then:
cargo llvm-lines
and the result (only the top ones):
Lines Copies Function name
----- ------ -------------
948075 38104 (TOTAL)
40764 (4.3%, 4.3%) 215 (0.6%, 0.6%) async_graphql::resolver_utils::container::Fields::add_set::{{closure}}
15855 (1.7%, 6.0%) 735 (1.9%, 2.5%) core::result::Result<T,E>::map_err
13524 (1.4%, 7.4%) 84 (0.2%, 2.7%) alloc::raw_vec::RawVec<T,A>::grow_amortized
13442 (1.4%, 8.8%) 48 (0.1%, 2.8%) <futures_util::stream::futures_unordered::FuturesUnordered<Fut> as futures_core::stream::Stream>::poll_next
12384 (1.3%, 10.1%) 43 (0.1%, 3.0%) async_graphql::resolver_utils::container::resolve_container_inner::{{closure}}
10425 (1.1%, 11.2%) 75 (0.2%, 3.1%) alloc::raw_vec::RawVec<T,A>::try_allocate_in
10191 (1.1%, 12.3%) 43 (0.1%, 3.3%) async_graphql::resolver_utils::container::Fields::add_set
9744 (1.0%, 13.3%) 48 (0.1%, 3.4%) alloc::raw_vec::RawVec<T,A>::shrink
8464 (0.9%, 14.2%) 2116 (5.6%, 8.9%) core::pin::Pin<Ptr>::new_unchecked
8304 (0.9%, 15.1%) 162 (0.4%, 9.4%) <alloc::sync::Weak<T,A> as core::ops::drop::Drop>::drop
8160 (0.9%, 16.0%) 48 (0.1%, 9.5%) <futures_util::future::try_join_all::TryJoinAll<F> as core::future::future::Future>::poll
7912 (0.8%, 16.8%) 23 (0.1%, 9.6%) async_graphql::resolver_utils::list::resolve_list::{{closure}}
7452 (0.8%, 17.6%) 46 (0.1%, 9.7%) async_graphql::resolver_utils::list::resolve_list::{{closure}}::{{closure}}
7440 (0.8%, 18.4%) 48 (0.1%, 9.8%) <futures_util::stream::futures_ordered::FuturesOrdered<Fut> as futures_core::stream::Stream>::poll_next
7031 (0.7%, 19.1%) 293 (0.8%, 10.6%) core::option::Option<T>::map
6673 (0.7%, 19.8%) 48 (0.1%, 10.7%) futures_util::stream::futures_unordered::FuturesUnordered<Fut>::new
6050 (0.6%, 20.4%) 50 (0.1%, 10.8%) alloc::vec::in_place_collect::from_iter_in_place
5904 (0.6%, 21.1%) 48 (0.1%, 11.0%) futures_util::stream::futures_unordered::ready_to_run_queue::ReadyToRunQueue<Fut>::dequeue
5879 (0.6%, 21.7%) 288 (0.8%, 11.7%) <core::result::Result<T,E> as core::ops::try_trait::Try>::branch
5856 (0.6%, 22.3%) 139 (0.4%, 12.1%) core::option::Option<T>::map_or
5633 (0.6%, 22.9%) 215 (0.6%, 12.6%) async_graphql::resolver_utils::container::Fields::add_set::{{closure}}::{{closure}}
5473 (0.6%, 23.5%) 48 (0.1%, 12.8%) futures_util::stream::futures_unordered::FuturesUnordered<Fut>::release_task
5280 (0.6%, 24.0%) 48 (0.1%, 12.9%) futures_util::stream::futures_unordered::FuturesUnordered<Fut>::unlink
5232 (0.6%, 24.6%) 48 (0.1%, 13.0%) futures_util::stream::futures_unordered::FuturesUnordered<Fut>::link
5169 (0.5%, 25.1%) 83 (0.2%, 13.2%) <alloc::vec::Vec<T> as alloc::vec::spec_from_iter_nested::SpecFromIterNested<T,I>>::from_iter
5014 (0.5%, 25.7%) 162 (0.4%, 13.7%) <alloc::boxed::Box<T,A> as core::ops::drop::Drop>::drop
4881 (0.5%, 26.2%) 64 (0.2%, 13.8%) core::iter::traits::iterator::Iterator::try_fold
4790 (0.5%, 26.7%) 63 (0.2%, 14.0%) <&T as async_graphql::base::OutputType>::resolve::{{closure}}
4650 (0.5%, 27.2%) 75 (0.2%, 14.2%) alloc::raw_vec::handle_reserve
4628 (0.5%, 27.7%) 67 (0.2%, 14.4%) alloc::vec::Vec<T,A>::extend_trusted
4536 (0.5%, 28.1%) 24 (0.1%, 14.4%) <async_graphql::types::merged_object::MergedObject<A,B> as async_graphql::resolver_utils::container::ContainerType>::resolve_field::{{closure}}
4128 (0.4%, 28.6%) 48 (0.1%, 14.6%) futures_util::stream::futures_unordered::FuturesUnordered<Fut>::push
4089 (0.4%, 29.0%) 192 (0.5%, 15.1%) core::iter::adapters::map::map_fold::{{closure}}
4039 (0.4%, 29.4%) 25 (0.1%, 15.1%) async_graphql::context::ContextBase<T>::get_param_value
3968 (0.4%, 29.8%) 128 (0.3%, 15.5%) alloc::raw_vec::RawVec<T,A>::current_memory
3892 (0.4%, 30.3%) 99 (0.3%, 15.7%) core::iter::adapters::map::map_try_fold::{{closure}}
3504 (0.4%, 30.6%) 48 (0.1%, 15.8%) <core::slice::iter::IterMut<T> as core::iter::traits::iterator::Iterator>::fold
3390 (0.4%, 31.0%) 79 (0.2%, 16.1%) core::iter::traits::iterator::Iterator::fold
3386 (0.4%, 31.3%) 484 (1.3%, 17.3%) alloc::boxed::Box<T>::new
First question: what is doing cargo +nightly rustc -- -Z self-profile
? Is it like cargo build
without --release
? Or what?
Second question: as you can see LLVM_module_codegen_emit_obj
gets 60%
of total time. What is it? Can I improve it?