Compile time help

I have a 346 line crate foo_api that:

  1. is mostly generic structs/traits

  2. takes 2.5 s to compile

  3. make any crate that includes foo_api take 2.5s to compile

  4. I am getting no insights from "cargo llvm-lines"

Question: what other commands can I run to get intuition on why the $#(# this crate is slow to compile (and why it slows down everything that includes it?)

My suspicion: something to do with generic generating code that is thrown away at linker time.

2 Likes

Do you have any Macros? Any dependicies on other crates?

  1. It depends on the crates. In theory, this should not matter. I am measuring:
touch foo_api/src/**.rs
cargo build --timings

and only looking at the part for foo_api

  1. It involves 2 invocations to a macro_rules! I wrote and 1 invocation to a procedural macro I wrote. All 3 should be blazingly fast.

  2. I deeply suspect it has to do with expanding of generics. I am looking for something that offers more information than cargo llvm-lines

1 Like

There's a fasterthanlime article about profiling builds that might find you some clues. One gotcha when I tested it locally - if you're in a workspace, the profile data gets saved to the workspace root, not the package root. And note that if you don't use a Chromium-based browser, Perfetto provides the same UI as a web application instead of built-in to your browser.

3 Likes
+-------------------------------------------------------------------------+-----------+-----------------+----------+------------+-----------------------+---------------------------------+
| Item                                                                    | Self time | % of total time | Time     | Item count | Incremental load time | Incremental result hashing time |
+-------------------------------------------------------------------------+-----------+-----------------+----------+------------+-----------------------+---------------------------------+
| monomorphization_collector_graph_walk                                   | 27.37ms   | 11.515          | 121.28ms | 1          | 0.00ns                | 0.00ns                          |
+-------------------------------------------------------------------------+-----------+-----------------+----------+------------+-----------------------+---------------------------------+
| specialization_graph_of                                                 | 24.61ms   | 10.357          | 24.61ms  | 0          | 24.61ms               | 0.00ns                          |
+-------------------------------------------------------------------------+-----------+-----------------+----------+------------+-----------------------+---------------------------------+
| metadata_decode_entry_exported_symbols                                  | 22.20ms   | 9.340           | 22.20ms  | 196        | 0.00ns                | 0.00ns                          |
+-------------------------------------------------------------------------+-----------+-----------------+----------+------------+-----------------------+---------------------------------+
| metadata_register_crate                                                 | 20.46ms   | 8.610           | 108.61ms | 196        | 0.00ns                | 0.00ns                          |
+-------------------------------------------------------------------------+-----------+-----------------+----------+------------+-----------------------+---------------------------------+
| incr_comp_encode_dep_graph                                              | 10.56ms   | 4.445           | 10.56ms  | 108763     | 0.00ns                | 0.00ns                          |
+-------------------------------------------------------------------------+-----------+-----------------+----------+------------+-----------------------+---------------------------------+
| encode_query_results_for                                                | 10.34ms   | 4.349           | 10.34ms  | 55         | 0.00ns                | 0.00ns                          |
+-------------------------------------------------------------------------+-----------+-----------------+----------+------------+-----------------------+---------------------------------+
| expand_crate                                                            | 10.28ms   | 4.325           | 33.33ms  | 1          | 0.00ns                | 0.00ns                          |
+-------------------------------------------------------------------------+-----------+-----------------+----------+------------+-----------------------+---------------------------------+
| incr_comp_load_dep_graph                                                | 9.31ms    | 3.917           | 9.31ms   | 1          | 0.00ns                | 0.00ns                          |
+-------------------------------------------------------------------------+-----------+-----------------+----------+------------+-----------------------+---------------------------------+

this matches my intuition.

Is there anyway to figure out which struct/trait is causing all the monomorphization ?

1 Like

If you convert to Chrome format and use -Z self-profile-events=all, you should get more detail (IIUC). I don't have a complex crate easily to hand to test on (I used my quick-and-dirty Advent of Code workspace for the initial testing), but it should give you more details in "Arguments" when you click on a slow slice.

1 Like
  1. I got summarize, flamegraph, crox to all work.

  2. This is very helpful and I'm grateful for you sharing.

  3. I have 1 last problem: crox seems to show total runtime at ~300ms, whereas cargo build --timings shows it at 1.5-1.8s. There seems to be a 5-6x difference in measured time. Have you run into this issue ?

1 Like

I'm afraid I only ever needed to look at relative numbers, not absolute; when I've used it before, I was using it to track down something that had gone from taking a few seconds to taking 5 minutes (roughly) to build, and once I'd found the macro whose expansion was supralinear in its input and fixed it, I was done.

1 Like

On my phone right now, so can't test this: Does those timing include the final link time? Are you on a weird os (i.e. Windows) with some intrusive services (i.e. antivirus) running that could be throwing things off?

How do I test this? I could never figure out compile vs link on cargo build --timings. In a workspace, does link time hit anything other than the last/final/root crate? (This is an intermediate crate.)

I am on nixos linux, target wasm32-unknown-unknown. AFAIK nothing else (besides Chrome + IntelliJ Idea) is running.

Unfortunately I have no idea then, perhaps someone else knows more about this.

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.