How to force cargo to honor the specified number of parallel compile jobs?

When specifying a large parallel number to cargo, e.g., -j30 on my machine which has 10 cores, cargo invokes less than 30 jobs in parallel, while there are 50 crates to compile. I observed that cargo will not invoke more jobs until some already-invoked job finishes. Is there a way to force cargo to really invoke the specified number of jobs?

I know this sounds unreasonable as it is not the most efficient way to do the work. But I need this due to some special task.

Crates can't be compiled until all their dependencies have finished emitting crate metadata. You can see the crate metadata like a kind of binary header file generated by the compiler from the source code.

2 Likes

Cargo's --timings option will give you a report that can help you understand where things are waiting.

https://doc.rust-lang.org/cargo/reference/timings.html

1 Like

Thanks. This makes sense to me. I assume the crate metadata will be written to some temporary files so other compile process can read them, correct? Can you point to me when does rustc generate those crate metadata? I guess it's before the llvm codegen but I'd like to know where exactly it creates those files.

Indeed. It is stored in the .rmeta files next to the .rlib files with the object code. (technically .rlib files also contain a copy of the metadata for back compat reasons, but that copy isn't used if a .rmeta exists) It is done by the call at

just before codegen starts.

2 Likes

Thanks!, To confirm I understand it correctly: when we give a large parallel number to cargo, it decides whether to invoke a new compile process (say, for crate foo) by checking if the crate metadata of foo's all dependencies exists as metadata files, which are generated by rustc_metadata::fs::encode_and_write_metadata(), and cargo does not care whether the dependency crates have been lowered to LLVM IR, correct?

In other words, if I do need cargo to invoke a new compile process, all I need to do is to make sure its dependencies metadata have been created.

Correct. In the past it used to be necessary to wait for rustc to finish entirely, but that changed to the current scheme with the introduction of "pipelined compilation" (see Implement "pipelined" rustc compilation · Issue #6660 · rust-lang/cargo · GitHub). The only exception is when a crate needs to be linked, like when it produces an executable or dynamic library. In that case it needs to wait for all dependencies to be completely compiled.

1 Like

Hi @bjorn3 , I still have trouble with this issue and hope you can give me some help. After the call to rust/passes.rs at 47d1cdb0bcac8e417071ce1929d261efe2399ae2 · rust-lang/rust · GitHub, I have my own analysis code on MIR, and my code has an infinite loop that waits for some temporal analysis results of other crates. The whole program has 35 crates and I have set the parallel compile level to be 100. I saw that cargo invoked 25 compiling processes and all of them were stuck in the loop, and meanwhile there are 21 .rmeta files generated in target/release/deps. My understanding from previous discussions is that this infinite loop should not block cargo from invoking more compile process because before the loop the its metadata should have already been generated. But it looks like that's not the case. Any thought why this happened? It looks like it may have something to do with libc, log, pkg-config because I don't see .rmeta for them generated, but they are being compiled (stuck in the infinite loop).

Cargo is likely waiting for codegen of all crates to be finished before it runs rustc for the final binary crate. Binary crates need all dependencies to be fully compiled as they link the object files of dependencies.

Thanks for the reply. I believe this should not be the case because there are 35 crates and cargo invoked 25 in total. I found that the rmeta for the libc, pkg-config, and log crates are missing, although rustc is compiling these crates. It looks like rustc just skipped the metadata generation for these crates. They are not in the target/release/deps, but I saw directories for log and libc created under target/release/build, and these directories have an out directory but that's empty. Any idea why these few crates are specially treated?

These crates have build scripts which are binaries. These build scripts need to run first before cargo can run rustc on the main lib crates of the respective packages.

Thanks for pointing this out. Two issues. First, I checked the source code of pkg-config and did not see a build script (GitHub - rust-lang/pkg-config-rs: Build library for invoking pkg-config for Rust). I guess I missed something?

Second, I found the behavior of the metadata generation function encode_and_write_metadata() confusing. I added some debugging printing in this function and found that before the metadata for a crate is really generated, the call to encode_metadata() may divert the execution of a function to somewhere else, and then does a lot of other work and then get back to do the metadata generation (that's for some crates; and for some other crates there seems to be no diversion). So I'm confused with why this happens.

Right, pkg-config is used as dependency of a build script.

encode_metadata is what actually generates the raw bytes of the metadata, but it doesn't write it. As for why it runs other seenungky unrelated code, this is because rustc has a pull based query architecture instead of a push based architecture like traditional compilers. This means that it will lazily do many passes when necessary rather than as soon as possible. This can mean that for example when encoding metadata needs optimized mir, the mir will be optimized within the call to encode_metadata if this hasn't been done yet for the respective function.

1 Like

Thanks very much for the explanation. It does match what I observed: some crates got processed in the middle of encode_metadata and some not.

I'm still confused with one thing. I believe that after the call to encode_and_write_metadata, the metadata for the currently-compiled crate should have been generated, correct? I put an infinite loop after this call, and set a parallel compile level way larger than the total number of crates. I thought the infinite loop would not stop cargo from invoking compiling new crates. However, the whole compiling procedure still got stuck, looking like some crates are still holding off the compilation of other crates, and this contradicts my previous assumption about metadata generation.

I confirmed that all the pending compiling processes have their metadata generated to target/release/deps.