dpc
April 2, 2025, 9:06pm
1
Our team is working on a relatively large Rust, with a lot of chunky artifacts being produced as the result of the build and we've started to run into issues, where builds are running out of memory on certain systems. Too many binaries being linked at the same time, each requiring e.g. 7GB of memory (lots of code, LTO, release build).
This makes it difficult to script/package Rust code for general purpose consumption. We can add -j 1
, but it makes the build unnecessarily slow and underutilized memory. We can lower the build settings, but that just delays the problem etc.
Ideally, there would be some way to cap memory usage. E.g. we could specify via env vars / arguments / cargo config that we expect a single crate to need certain amount of memory, and Rust would adjust the effective -j
based on the free memory available on the system. On systems will capacity, all cores would be utililized, on systems with relatively little memory compared to the number of cores, fewer.
Currently, we are considering just adding some shell scripting to calculate safe -j
based on outputs of nproc
and free
,
LLVM's build scripts have options to limit the number of link jobs run in parallel based on either limiting to a specified number or by calculating from available ram , maybe cargo should gain options for that?
1 Like
Automatically limiting memory consumption is much harder than limiting CPU consumption because CPU use relatively closely corresponds to threads (as long as the fraction of time spent in IO or acquiring locks is low, which is the case for rustc). It's even possible to reactively limit CPU use by interleaving multiple threads on a physical core or limiting their scheduler budget, so the compilation process can still make progress.
Memory has none of that ease. You can't really predict how much memory each crate will take. And it's also not possible to budget memory because once the budget runs out then there's no practical way to claw it back from process A to let process B make progress... not without either using swap or killing the process.
There are approaches that might work some of the time, but they wouldn't be as reliable as limiting CPU utilization.
opened 05:17AM - 03 Nov 23 UTC
C-feature-request
S-needs-design
A-jobserver
**tl;dr:** Introduce a simple mechanism for limiting parallelism automatically i… n Cargo, to avoid consuming all system resources during the compilation.
## Problem
Cargo by default uses all cores ([`std::thread::available_parallelism`](http://doc.rust-lang.org/1.73.0/std/thread/fn.available_parallelism.html)) and spawns off `rustc` or build scripts onto each core. This is not an issue when compiling on a decent machine. When working on low-end machines or large-scale codebase, developers often encounter issue like extremely high CPU loads or out-of-memory errors.
To solve these problem, developers can set `--jobs` from command line, or `build.jobs` in `.cargo/config.toml` to control the maximum parallelism Cargo can use. This is not ideal because
* `build.jobs` is bound to number of core in use. It is not immediately correlated with memory usage. Executing parallel builds might lead to out of memory before any CPU throttling happens, especially when several linker invocations happens.
* `build.jobs` assigns core fairly to each unit of work (i.e. a crate build"). However, some crate builds might consume more computing resources. If those crate builds are bottlenecks of the entire compilation, we might want to throw more resources to complete them to unblock other crate builds.
* Developers need to set `build.jobs` explicitly to control the parallelism. However, it's often a long trial-and-error to figure out a proper value. The value also varies in different environments. Not really user friendly.
* Developers might want a full control of every dependency build. `build.jobs` is too coarse-grained.
## An "ideal" approach (but not now)
There are a couple of existing proposals trying to improve the situation. Some of them want to define a weight on a certain job, or tag jobs into a group. With weights and tags, job scheduler understands whether it should allocate a job. This is pretty much the ideal solution, as it maximizes the control of parallelism for developers, and the system could be extend to the job scheduling optimization.
However, such a system requires developers to fully understand the entire compilation of their projects. For now, the data is either missing or hard to get from Cargo. To incrementally build the system, there are prerequisites:
* Cargo can monitor the resource usage of the system and each unit of work during a build.
* Cargo can persist the resource usage of each unit of work for each build.
## Start small
We should start small, and focus on monitoring the resource usage, additionally limiting the parallelism when the usage exceeds a threshold.
Some options we can do:
* Assign the maximum amount of resources that Cargo can use. This is how `build.jobs` works now. We might need an equivalent for memory usage. Something like
```toml
[build.limit]
local-memory = "3GiB" # or "95%" or "100% - 200MiB"
```
* Set a system threshold. Cargo won't allocate any new job and wait for the entire system usage going down, even when the usage of Cargo itself is still under the assigned maximum.
```toml
[build.limit]
system = "3GiB" # or "95%" or "100% - 200MiB"
cpu = "100%"
```
To minimize the impact of bad data points, these metrics will be sampled and averaged out within a period of time.
Instead of "usage", we can also leverage the concept "load average" from Unix-like, which might make more sense to manage computing resource loads.
I entirely don't know which one we want, or both, or none.
## Library to use
* [`procfs`](https://crates.io/crates/procfs) — used by wider Rust web-dev community, via promethues and other metrics crates.
* [`sysinfo`](https://crates.io/crates/procfs) — another popular crate for inspecting system info.
Both of then introduce an excessive amount of code Cargo doesn't need at this moment.
Alternatively, we can use syscall lib directly to get these info.
## Prior arts
* Bazel
* `--jobs`
* [`--local_{ram,cpu}_resources`](https://bazel.build/docs/user-manual#local-resources) to assign resources a build can use
* Buck
* `--jobs`
* [`link_weight`](https://buck2.build/docs/legacy/files-and-directories/dot-buckconfig/#link_weight) to config how many job a link job consumes.
* Cabel
* `--jobs`
* Got the same linker invocation issue <https://github.com/haskell/cabal/issues/1529>.
* CMake
* `-j` to set max number of concurrent processes
* GitHub Actions
* has `concurrency.group`
* Go
* `go build -p` limits the number of programs, such as build commands or test binaries, that can be run in parallel.
* `GOMAXPROCS` to limit the number of OS threads that can execute user-level Go code simultaneously.
* Gradle
* `--max-workers` — like `--jobs`
* Has a `SharedResourceLeaseRegistry` for registering a resource with its maximum lease numbers. Like a semaphore.
* Parallelism can be configured per-project on demand.
* make
* `-j` to set max number of concurrent jobs
* `--max-load` to limit the start of a new job if load average goes above the value
* Read [Parallel](https://www.gnu.org/software/make/manual/make.html#Parallel) for more
* Ninja
* has a [pool](https://ninja-build.org/manual.html#ref_pool) concept that user can assign some stage of build to a pool with more restricted parallelism rules.
* Nix
* [`max-jobs`](https://nixos.org/manual/nix/stable/command-ref/conf-file.html#conf-max-jobs)
* sbt
* [`tasks` are tagged](https://www.scala-sbt.org/1.x/docs/Parallel-Execution.html), and each tag get a default weight of resource restriction.
## Related issues
There are more issues regaring scheduling but I dont want to link to them here. These are issue of people trying to tell Cargo not to be that greedy.
* https://github.com/rust-lang/cargo/issues/7480
* https://github.com/rust-lang/cargo/issues/8405
* https://github.com/rust-lang/cargo/issues/8556
* https://github.com/rust-lang/cargo/issues/9157
* https://github.com/rust-lang/cargo/issues/9250
* https://github.com/rust-lang/cargo/issues/11707
* https://github.com/rust-lang/cargo/issues/12916
* https://github.com/rust-lang/cargo/issues/14190
And sorry I opened a new issue instead. Feel free to close and move to any existing one.
Too many binaries being linked at the same time
For that one wouldn't have to limit build-parallelism in general, just the linking phase.
opened 08:04PM - 09 Feb 21 UTC
A-linkage
C-feature-request
S-needs-design
A-jobserver
In CI at my work, we ran into a situation where rustc would get OOM-killed while… linking example binaries:
```
error: linking with `cc` failed: exit code: 1
|
= note: "cc" <…>
= note: collect2: fatal error: ld terminated with signal 9 [Killed]
compilation terminated.
```
We were able to mitigate this by using a builder with more available memory, but it's unfortunate. We could dial down the parallelism of the whole build by explicitly passing `-jN`, but that would make the non-linking parts of the build slower by leaving CPU cores idle.
It would be ideal if we could explicitly ask cargo to lower the number of parallel linker invocations it will spawn. Compile steps are generally CPU-intensive, but linking is usually much more memory-intensive. In the extreme case, for large projects like Firefox and Chromium where the vast majority of code gets linked into a single binary, that link step *far* outweighs any other part of the build in terms of memory usage.
In terms of prior art, [ninja has a concept of "pools"](https://ninja-build.org/manual.html#ref_pool) that allow expressing this sort of restriction in a more generic way:
> Pools allow you to allocate one or more rules or edges a finite number of concurrent jobs which is more tightly restricted than the default parallelism.
> This can be useful, for example, to restrict a particular expensive rule (like link steps for huge executables), or to restrict particular build statements which you know perform poorly when run concurrently.
[The Ninja feature was originally motivated by Chromium builds switching to Ninja and wanting to support distributed builds](https://groups.google.com/g/ninja-build/c/6oJzx6QI0Vg), in which there might be capacity to spawn way more compile jobs in parallel since they can be run on distributed build nodes, but link jobs, needing to be run on the local machine, would want a lower limit.
If this were implemented, one could imagine a further step whereby cargo could estimate how heavy individual linker invocations are by the number of crates they link, and attempt to set a reasonable default value based on that and the amount of available system memory.
dpc
April 3, 2025, 8:51pm
4
That's why I'm siding on ability for the project developers to customize the estimation of memory needed for a single job/crate/linker-invocation via cargo config.
That's OK, it's still worthwhile to try to do it.
However, hopefully the memory technology will keep giving us more and more memory even on low end devices, so maybe it's not worth doing because projects hitting these limits are not very common, and the amount of memory used will relatively keep being smaller.
I already rolled out a bandaid wrapper script that will tweak -j
based on the system's memory, and that should be good enough for us.