Limiting number concurrent LTO jobs without lowering compilation parallelism?

I have a workspace with many binaries, but I don't have enough RAM to run more than one LTO job at once. Linking with LTO tries to link several of them at once, it runs out of memory, and everything grinds to halt until the OS kills ld and cargo.

OTOH if I run cargo build -j1, it takes forever to compile all the dependencies one by one.

Is there a way to keep full parallelism for building crates, but force LTO linking steps to be done one by one?

2 Likes

I don't think there is a way to do that. The jobserver protocol doesn't provide a way for a process to request all job tokens as would be necessary to prevent any other build process from running at the same time. You can only ask for a specific amount of tokens. If you wanted to limit the LTO step for a single process to a single thread while still allowing other processes to run, -Zno-parallel-llvm allows disabling all parallelism within the codegen backend of a single rustc process,

Purely for my curiosity (since I'm not currently volunteering to implement anything to help), when you say "jobserver protocol", do you mean the GNU make jobserver protocol?

If so, would this problem be mitigated if cargo as the root jobserver measures available memory and provides an equivalent of GNU make's --max-load option requiring a certain minimum amount of available memory before allowing a process to claim a token, unless there's no child processes running?

There's a lot of detail that'd have to be considered - the obvious problem I can see in my design is that processes don't claim all the memory they're going to use immediately on startup, so we'd need to consider time since last token was issued as well as available memory to keep parallelism high without exhausting memory.

See Limiting the parallelism automatically · Issue #12912 · rust-lang/cargo · GitHub

Basically there are two issues

  • the jobserver manages CPU consumption, not memory. those are distinct resources
  • oversubscribed CPUs are non-fatal, OOM is. So the former is much more forgiving about transient excesses

If we want to manage memory consumption we need a new mechanism. Or if that's too difficult then treat the whole thing as fallible and add some recovery.

That's not a thing. The jobserver is distributed, like a single shared semaphore that all processes access through the same API. The processes don't negotiate individually with a central broker whether they can have a token.

1 Like

Is Cargo not aware when the linker is invoked? cargo build --verbose shows the invocation. Is that coming from rustc?

I thought it could be entirely independent and unrelated to jobservers. First Cargo could build dependencies and all the lib targets, and then serialize rest of the build graph.

1 Like

Rustc invokes the linker. Cargo doesn't know anything about this. I don't see the linker invocation with cargo build -v. Also in rustc LTO is not done by the linker, but by rustc itself. (Unless -Clinker-plugin-lto is used.)

2 Likes