Killing subprocesses of `std::process::Command`

I want to write a utility that would spawn kubectl port-forward subprocesses and cancel and restart later. I'm currently on windows but would like this to work cross-platform.

Problem

kubectl port-forward service/<service> <port> creates another kubectl subprocess which binds a port:

❯ : ps | where name == "kubectl.exe"
╭───┬───────┬─────────────┬──────┬─────────┬─────────╮
│ # │  pid  │    name     │ cpu  │   mem   │ virtual │
├───┼───────┼─────────────┼──────┼─────────┼─────────┤
│ 1 │ 34712 │ kubectl.exe │ 0.00 │ 17.2 MB │ 16.1 MB │  # parent
│ 2 │ 52016 │ kubectl.exe │ 0.00 │ 33.3 MB │ 28.0 MB │  # child (binding port 5434)
╰───┴───────┴─────────────┴──────┴─────────┴─────────╯

❯ : netstat -ano | findstr :5434
  TCP    127.0.0.1:5434         0.0.0.0:0              LISTENING       52016
  TCP    [::1]:5434             [::]:0                 LISTENING       52016

When pressing Ctrl+C in a terminal, both parent and child kubectl programs exit, freeing the port.

When I run the same program with std::process::Command the kill function only kills the parent process, the child kubectl keeps the port open. The same when I kill the parent manually:

❯ : taskkill /PID 34712 /F
SUCCESS: The process with PID 34712 has been terminated.

❯ : ps | where name == "kubectl.exe"
╭───┬───────┬─────────────┬──────┬─────────┬─────────╮
│ # │  pid  │    name     │ cpu  │   mem   │ virtual │
├───┼───────┼─────────────┼──────┼─────────┼─────────┤
│ 1 │ 52016 │ kubectl.exe │ 0.00 │ 35.0 MB │ 30.3 MB │
╰───┴───────┴─────────────┴──────┴─────────┴─────────╯

❯ : netstat -ano | findstr :5434
  TCP    127.0.0.1:5434         0.0.0.0:0              LISTENING       52016
  TCP    [::1]:5434             [::]:0                 LISTENING       52016

Interestingly, when I exit the rust program the child process is killed too, so the OS does have the capability to clean up orphaned "grandchildren" processes.

Remark

At least on windows you can query child processes:

Get-WmiObject Win32_Process | Where-Object { $_.ParentProcessId -eq 34712 } | Select-Object ProcessId, Name

ProcessId Name
--------- ----
    52016 kubectl.exe

On Linux you should be able to run ps --ppid <parent_pid> (haven't tested yet).

However I would prefer to rely on the OS to clean up "zombie" child processes if possible.

Shells are somehow able to do this and I don't think that all of them do this kind of process querying directly.

Question

I would like to know if there is a standard approach for this. This is a problem that every shell has had to solve but I am not able to find relevant info.

1 Like

The general way of doing this in Unix, which shells would be using, is with process groups. Rust provides access to this via std::os::unix::process::CommandExt: if you call .process_group(0) then both kubectl processes will be put in a new process group with the ID of the parent kubectl. You can then kill the entire process group at once using kill or killpg from the libc crate.

I suspect that this will stop the shell from cleaning everything up when your program exits or when Ctrl-C is pressed, and that you'll need to implement that part of the logic yourself, including writing at least a signal handler for SIGINT.

I don't know what the normal way to do this is in Windows. There's a job API that should let you start both processes in a new job, but I'm not familiar with it and I don't know if it actually works as advertised. You could just use EnumProcesses to search all the processes on the system for the one you're looking for, more quickly than using WMI. Or, you could create a new console with the CREATE_NEW_CONSOLE flag when you start kubectl, then use GenerateConsoleCtrlEvent to effectively send it a Ctrl-C and get the normal OS behaviour.

2 Likes

I found process-wrap crate which wraps {std, tokio}::process::Command and can create something called JobObject for a given Command on windows. The command's subprocesses become part of the JobObject and when it is killed all subprocesses are killed too which solves my immediate problem.

I don't have a linux machine with kubectl set up right now so I cannot test that currently.

2 Likes

Looms like process-wrap still requires you to write your own SIGINT handler to kill the subprocesses when you use ctrl-c.

I've only tested this on windows but the behavior matches my expectations.

The program is now able to cleanly restart kubectl port-forward when I manually kill the target pod (the pod is then automatically re-created by kubernetes, but the connection needs to be recreated):

PS C:\Users\lubomir.kurcak> tend r rancher-postgres
rancher-postgres: Forwarding from 0.0.0.0:5434 -> 5432
rancher-postgres: Handling connection for 5434
rancher-postgres: Handling connection for 5434
rancher-postgres (stderr): E0922 21:34:54.365199   46140 portforward.go:400] an error occurred forwarding 5434 -> 5432: error forwarding port 5432 to pod 8aa90c3e62bbc5a6cb372d225f06136c3fc1c05463ff831b8134978a7865c50d, uid : container not running (8aa90c3e62bbc5a6cb372d225f06136c3fc1c05463ff831b8134978a7865c50d)
rancher-postgres ran for 47.2s (restarting)
rancher-postgres (stderr): error: unable to forward port because pod is not running. Current status=Failed
rancher-postgres ran for 254ms (restarting)
rancher-postgres (stderr): error: unable to forward port because pod is not running. Current status=Failed
rancher-postgres ran for 254ms (restarting)
rancher-postgres (stderr): error: unable to forward port because pod is not running. Current status=Pending
rancher-postgres ran for 269ms (restarting in 1 seconds)
rancher-postgres: Forwarding from 0.0.0.0:5434 -> 5432
rancher-postgres: Handling connection for 5434
rancher-postgres: Handling connection for 5434

And I am able to kill the entire job with Ctrl-C and restart it immediately:

PS C:\Users\lubomir.kurcak> tend r rancher-postgres
rancher-postgres: Forwarding from 0.0.0.0:5434 -> 5432
PS C:\Users\lubomir.kurcak> ^C
PS C:\Users\lubomir.kurcak> tend r rancher-postgres
rancher-postgres: Forwarding from 0.0.0.0:5434 -> 5432

I intend to test this as soon as I get a linux/mac machine but I am fine with this for now.

non-portable options:

  • linux: put them into a cgroup and kill that by writing to the cgroup.kill file
  • windows: job objects and TerminateJobObject

That is because on Windows job objects are hierarchical so every process inside the child job object you created would also be part of the parent job object that the terminal created. This is not the case on Unix however. Each job is entirely detached from the job of the process that created the new job.

1 Like

With cgroups v2 (which systemd is pushing for) I believe you did have to be either inside a service/slice/scope which has cgroup delegation enabled to be able to create a child cgroup of your parent cgroup (cgroups v2 doesn't allow non-leaf cgroups to have processes attached to them) or you have to use the systemd dbus api to spawn a new scope (with corresponding cgroup) the same way systemd-run would.

On the systems I have there's generally some delegatable cgroups associated with a user, yeah. And a child process doesn't have to be a cgroup that is a child of the current cgroup. They're not simply colored branches of the process tree.

I'm not sure if that's what's happening here, though. If we look at Microsoft's description of CTRL-C in console processes, the SIGINT is simply sent to every process which is attached to the console, regardless of the process or job hierarchy.

In contrast, in POSIX systems like Linux, the SIGINT is only sent to every process in the foreground process group. When creating a new background process group like the process-wrap crate does, the parent foreground process will need to catch SIGINT and signal all the background process groups itself.

1 Like

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.