Consider a program like xargs which may end up with multiple children at the same time and spawn more as time passes (see the -P option).
For educational purposes I'm trying to write an equivalent in Rust, but I got stumped here.
Googling around I found other toy implementations in Rust, all of which either don't support multiple processes or resort to multithreading to do it, which I consider a no-go for this particular problem.
If this was C I would just vfork to create children. The main loop would read(2), but also check for a flag from the SIGCHLD signal handler, indicating waitpid is needed. There is basically nothing to it.
In Rust I found one is expected to use std::process::Command (which btw uses fork instead of vfork which is kind of a loss). Docs only show how to explicitly deal with that specific child with .wait, there is no explanation that I can see how to handle more processes.
I found a signal-hook crate, so I can find out I got SIGCHLD. But if I waitpid based on it and reap the child, I'm going to do it from under the respective Child object. If I don't waitpid I don't know what PID it is so I don't know which object to .wait on anyway. (I could waitpid with WNOWAIT as a hack, but there is no way this is how it should be done)
Rust code which I did find cops out of the problem by having a dedicated thread for each child, which is a waste of resources and can actively hurt real-world usage if one was to implement things like that in the real xargs.
So what's the expected way to structure this in Rust?
Ok, so I might have misunderstood I might also be out of my depth here, but let me just ask: You can spawn multiple processes, so what do you want to do with them that you can't see how to do? Do you want to check their status, their PID or... ?
The program would have an event loop waiting to parse more arguments coming in from stdin along with spawning and reaping children as needed. For example with, say, -P 20 I might have 16 children, parse some extra input from stdin and find that some of them exited. How do I even find out which Child objects are affected?
This is almost certainly sub-optimal, but will work without dragging in a runtime like Tokio.
You could keep a BTreeMap<u32, Child> to track your children, where you get the u32 from Child::id(). Then, when you get SIGCHLD, you can use BTreeMap::remove() to remove the referenced Child from the map, and call Child::try_wait; if it's not finished, put it back in the BTreeMap for a later iteration to find.
That would be a waitpid call for every process, every time. That's even slower than the hack I described where I waitpid(WNOWAIT) and then I know which child explicitly to wait on.
This is not a hacky solution - it's tracking the Child objects by PID (which may be sub-optimal), and using the knowledge you have (from siginfo_t::si_pid() or from waitpid(WNOWAIT), both work for this) to wait only on the PIDs that you expect to see information about.
If you want a solution that doesn't require looping over all the Child objects, and doesn't require you to track the Child objects by PID, then you will need to go "underneath" the Child abstraction, and do things the way you'd do them in C - drop the Child objects once the process is started (which leaves the process running in the background), and wait on the PIDs when you're notified that there's a reason to wait.
Indeed having a fd for every child and plugging that into an event loop would do the trick very cleanly, it is a bummer the feature is in Nightly only for the moment. With the assumption this is going to get fully beaten into shape and available in regular builds going forward, I would say this is the way to go.
Well the waitpid(WNOWAIT) thing is just another syscall trip which should not be necessary. I don't see how to extract the info from the signal on Rust, the one crate I found which does not spawn threads behind my back only allows to set a flag.
Anyhow, The Right Way(tm) (anyhow as far as I'm concerned) was linked by 2e71828 above.
Using the signal-hook crate, you'd use an Exfiltrator that gets the origin of the signal. Then you have the PID, and you can look it up in your data structure of choice, or just wait for the child directly if you've already dropped the Child.
You could use Tokio, which does its own syscalls and so doesn't have to wait for std stabilization of anything. (I checked and it seems that its implementation uses pidfd on Linux and polling all children on other Unix, which seems reasonable.)
Poking around I don't think it does the job though. To my reading there is only space for one siginfo per signal number, meaning if I get 2 SIGCHLDs before I manage to read it, I'm going to lose one of them.
I would argue there should be an easy way do stuff without pulling in any big crates, that aside tokio's own description of their handling of the matter basically discourages it. They make a fishy claim that pid fd's are not pollable, I'm going to have to look into it.
greppety-grep and I found that poll is supported for pidfd, so looks like their commentary is just stale(?). Interested parties can find the implementation here: linux/fs/pidfs.c at master · torvalds/linux · GitHub
I think the comment is stale in that it describes only the non-Linux implementation which doesn't use pid fds, not the Linux-only implementation which does.
I actually implemented exactly this in a small tool I wrote for personal use
It seems to work good but I can't promise it doesn't have UB or that the windows impl works correctly
Feel free to read the code here, if I remember correctly I created a group and added every sub process to it and then waitpid on the gid.