Systemd killing Rust compiles on Linux

This is really a Linux bug, but it affects Rust compiles. Randomly, Rust compiles are killed by systemd, which decides that the Rust compiler is using too much memory. Systemd makes these log entries:

Jun 14 19:35:30 user-desktop systemd-oomd[2335671]: Killed /user.slice/user-1001.slice/user@1001.service/app.slice/app-org.gnome.Terminal.slice/vte-spawn-8b4db362-3cd2-439a-84d6-6a119f700bcc.scope due to memory pressure for /user.slice/user-1001.slice/user@1001.service being 50.95% > 50.00% for > 20s with reclaim activity
Jun 14 19:35:30 user-desktop systemd[1698]: vte-spawn-8b4db362-3cd2-439a-84d6-6a119f700bcc.scope: systemd-oomd killed 9 process(es) in this unit.
Jun 14 19:35:30 user-desktop systemd[1698]: vte-spawn-8b4db362-3cd2-439a-84d6-6a119f700bcc.scope: Consumed 2w 13h 30min 24.000s CPU time.

This machine has 32GB of RAM and 6 CPUs, so "cargo build --release" tries to use all that hardware. But systemd apparently is unhappy with one process group using more than half the memory for more than 20 seconds.

What seems to contribute to the problem is IDrive, the backup program. IDrive periodically looks a a lot of files to see if they changed. So, on an idle system, most of memory gets used as file cache. That can be "reclaimed" when something actually needs memory. But systemd doesn't like too much of that.

Should the compiler scheduler not be so aggressive about taking over the whole machine?

it seems like you need to adjust your oomd settings, or tell cargo to use less build jobs. i use earlyoom on a core i3 with 4GB of ram and i never have rustc get killed. although i also have a per-terminal io limit configured, which probably helps.

2 Likes

I think it's the combo of IDrive accessing too many files, how Linux handles recent I/O, Rust trying to use the whole machine, and systemd being too aggressive. There are several ways I could throttle down the resources Rust will use compiling, but this only happens about once a week and the rest of the time compiles would be slower.

I would suggest not backing up or monitoring the build directory (i.e. target). You might also want to exclude some of the cache dirs from ~/.cargo

1 Like

That is the desired behavior on build servers/CI. And even on desktop machines it's still useful for it to have it use up most of the free resources and only leave a little headroom.
Limiting CPU resources is easy and can be done dynamically. Limiting memory consumption is much more difficult. Once cargo has spawned a rustc instance it has essentially commited to gobbling up an unknown amount of memory because crate complexity varies and it can't back out. It'd be possible to stop compiling additional crates, but we can't take back committed work (without wasting progress by killing rustc).

Relevant cargo issue: Limiting the parallelism automatically · Issue #12912 · rust-lang/cargo · GitHub

It's a systemd problem.

Ref: How do I disable the systemd OOM process killer in Ubuntu 22.04? - Ask Ubuntu

Ref: systemd built-in OOM killer (*not* systemd-oomd) kills all processes in user session/scope unit upon kernel OOM and doesn't respect OOMPolicy= · Issue #25376 · systemd/systemd · GitHub

Search for "with reclaim activity" to see the many reports of this problem and the contradictory advice on fixing it.

The inherent problem is that 1) systemd does not like one process group using more than 50% of memory for tens of second, and 2) Rust compiles are set to use more than half of memory for tens of seconds. Hence, trouble.

3 Likes

Not an inherent systemd problem. You have an systemd-oomd running on your system that's specifically configured in that way. It's an optional, configurable part.
You can either disable it (which will lead to longer hangs during OOMs since the kernel's OOM-killer is more reluctant to spring to action) or adjust its configuration to allow a larger slice or wait longer until it kills.

systemd built-in OOM killer (not systemd-oomd) kills all processes in user session/scope unit upon kernel OOM and doesn't respect OOMPolicy= · Issue #25376 · systemd/systemd · GitHub

That seems to be a different issue. Your log clearly says systemd-oomd.

4 Likes

This does not reproduce on my system - systemd is very happy for my Rust process group to use 40 GiB for about 60 seconds on a machine with 64 GiB RAM. My systemd-oomd settings are configured to look at how much swap pressure a group is putting on the system, and even with 40 GiB in use by cargo, rustc, ld etc, there's still no swap pressure.

1 Like