Way to Spawn a Daemon and Then Drop the Stderr Pipe

I have a situation where I am spawning a child process that will listen on a socket for RPC calls. My program that is starting the daemon will spawn the process and then attempt to connect on the socket to make sure the process was successful. If it was not successful I want to grab the stderr and print it so that the user can see why the process failed. When I spawn the process I set stderr to Piped so that I can capture it in case there is a failure.

The problem is that, when the daemon succeeds, I want my program to exit and let the daemon run in the background. Because I have set stderr to Piped, though, when my program exits, the child process stalls because, I'm assuming, the stderr pipe was broken.

Everything works fine if I set stderr to null, but the problem with that is that I can't get the error message if the daemon fails in that case.

I guess I can make the daemon process log to a file instead of stderr so that I can get the output that way if that is my only option, but I wanted to make sure I can't do it how I've described first.

Note: I'm the one writing the daemon so I'm not sure if it is my fault that the daemon freezes when the pipe is broken. All of the output is generated with eprintln!. I would think that the daemon would be panicking if I eprintln! on the broken pipe, but the process is still visible in htop so I don't know exactly what is going on.

You might want to set SIGPIPE to SIG_IGN in the child, if you haven't already, using sigaction or the child will be terminated if it writes to stderr after pipe is broken.

What does /proc/$pid/wchan say for the hung child thread?

You might also want to take a look at daemon (7)

Would that be the same as using the signalhook crate like this, or would it be better to use the libc crate directly?

// Ignore broken pipes
    unsafe { 
        signal_hook::register(signal_hook::SIGPIPE, || ())?;
    }

futex_wait_queue_me

Thanks for the link. In reality my daemon isn't a true daemon, but just a background process that listens on a socket and takes action on the clients' behalf so I luckily don't have to worry about that stuff.

signalhook would do it, but has a lot more than you need. the nix crate has a simple wrapper for sigaction and I don't think there's any pitfalls to a simple sigaction(SIGPIPE, SigAction::new(SigHandler::SigIgn, SaFlags::empty(), SigSet::empty())

Hmm, that means the child is waiting on a mutex.. Do you have any explicit synchronization in your child process? Otherwise I'd speculate it's the implicit locking in std::io::Stderr where another thread has the lock blocking on writing to the pipe (which would only be true if you didn't in fact close the other end of the pipe) and your main thread (the one with the process's pid) is blocking waiting for the lock.

Maybe use ptrace to find out?

Ok, cool.

There is an Arc<AtomicBool> that is used to stop a server listen loop and there is an RPC server that I didn't write that is likely doing some synchronization with its thread-pool.

This is the code that start's the process. Essentially if I change one of those Stdio::null()'s to Stdio::piped() it will hang. Which is why I am pretty sure the locking is related to standard out/error. But I'm not sure yet if it only starts to hang after it tries to connect to the socket. :thinking: I'll have to do some more testing.

So at least that give me a good lead to investigate.

How would I use ptrace? Is there an easy/Rusty way to do it? I was looking at the man page, but I'm not sure where to start to figure out what it is waiting on.

OK, I just found something interesting out. So I'm starting the daemon like this:

Command::new(std::env::current_exe()?)
            .args(&["daemon", "--socket-path", &socket_path, "start", "-F"])
            .stdin(Stdio::null())
            .stdout(Stdio::null())
            .stderr(Stdio::piped())
            .spawn()
            .context("Could not start lucky daemon")?;

And the daemon will hang because of the Stdio::piped() for stderr. But if I capture the Child in a variable like this:

let child = Command::new(std::env::current_exe()?)
            .args(&["daemon", "--socket-path", &socket_path, "start", "-F"])
            .stdin(Stdio::null())
            .stdout(Stdio::null())
            .stderr(Stdio::piped())
            .spawn()
            .context("Could not start lucky daemon")?;

The daemon will be perfectly responsive until child goes out of scope. I made sure of it by sleeping after spawning the server, making sure it worked, dropping child, then sleeping again and verifying that it didn't work.

The strange thing is that the API docs say that Child doesn't implement Drop:

There is no implementation of Drop for child processes, so if you do not ensure the Child has exited then it will continue to run, even after the Child handle to the child process has gone out of scope.

Edit: It must have to do, not with the child getting dropped but the ChildStderr inside of it getting dropped:

When an instance of ChildStdout is [dropped], the ChildStdout's underlying file handle will be closed.

So it seems like, when the file handle held by the parent process is closed, the child process get stuck waiting on a mutex. Maybe that is because whatever thread in the child process that was writing to stderr panicked when the pipe was closed and then left another thread to wait to obtain a Mutex from the panicked thread?

You can try to forge your own low-level pipe, so as to avoid having to deal with any locks whatsoever.

With Unix:

struct RawPipe {
    reader: RawFd,
    writer: RawFd,
}

impl RawPipe {
    pub
    fn new () -> Self
    {
        let mut reader_writer: [RawFd; 2] = [0, 0];
        assert_ne!(
            unsafe { ::libc::pipe(reader_writer.as_mut_ptr()) },
            -1,
            "SYS_pipe() failed.",
        );
        Self {
            writer: reader_writer[1],
            reader: reader_writer[0],
        }
    }
}

#[derive(Debug, Clone, Copy, PartialEq, Eq)]
pub
enum PipeTarget {
    Stdin,
    Stdout,
    Stderr,
}

impl private::Sealed for Command {}
impl CommandExt for Command {
    fn raw_pipe (self: &'_ mut Self, target: PipeTarget)
      -> File
    {
        let RawPipe { reader, writer } = RawPipe::new();
        let (pipe, file) = match target {
            | PipeTarget::Stdin
            => unsafe {
                (
                    process::Stdio::from_raw_fd(reader),
                    File::from_raw_fd(writer),
                )
            },

            | PipeTarget::Stdout
            | PipeTarget::Stderr
            => unsafe {
                (
                    process::Stdio::from_raw_fd(writer),
                    File::from_raw_fd(reader),
                )
            },
        };
        (match target {
            | PipeTarget::Stdin  => Command::stdin,
            | PipeTarget::Stdout => Command::stdout,
            | PipeTarget::Stderr => Command::stderr,
        })(self, pipe);
        file
    }
}

Usage:

fn main () -> ::std::io::Result<()>
{
    use ::std::{*,
        io::{BufRead, BufReader},
    };

    // Spawn the child with a handcrafted pipe on stderr
    let (child, child_stderr) = {
        use raw_pipe::{CommandExt, PipeTarget};

        let mut cmd = process::Command::new("/bin/sh");
        cmd .arg("-c")
            .arg("while :; do echo hi; done >&2")
        ;
        let child_stderr = cmd.raw_pipe(PipeTarget::Stderr);
        let child = cmd.spawn()?;
        (child, child_stderr)
    };

    // Buffered reads ought to be better.
    let mut child_stderr =
        BufReader::with_capacity(256, child_stderr)
    ;
    
    // Read the first line
    dbg!({
        let mut line = String::new();
        child_stderr.read_line(&mut line)?;
        line
    });

    // Break the pipe
    drop(child_stderr);

    // let the program run with a broken pipe
    thread::sleep(time::Duration::from_millis(500));

    // Since the program runs indefinitely, let's kill it after a while
    {child}.kill()
}

Ah, besides SIGPIPE, there's also the fact that eprinln!() panics on errors writing to stderr. So it seems this case of closing stderr but continuing to print!() to it in the child is not supported by Rust (whereas when you pipe to /den/null, the output is consumed and ignored by the kernel). So you must either take care to not eprint!() to Stderr after the parent closes it, or use your own pipe as Yandros suggested.

However, I don't think libstd would deadlock on panic writing to Stderr.. So there must be something else going here.

I'm not sure, but at least with strace you might get to see the last syscall on each thread before the hang..

EDIT: sorry, libstd panics in eprint!(), not std::io::Stderr

That would help explain why setting the SIGPIPE handler to nothing didn't help the issue.

Ah, OK. So then I could try just using writeln!(std::io::stderr(), "message").ok() and ignore any failures to print to standard error.

I just found out how to use gdb and gdbgui to debug my Rust application, which was refreshingly easy to use. It appears that the deadlock is happening somewhere in the socket server that I'm running, but I should be able to figure it out now, I think. If writeln!ing instead of eprintln!ing doesn't just fix the problem.

Haha! Fixed it! I just had to replace all of my print!'s with write!'s so that I can ignore any errors that it encounters while writing to stderr.

It seems like the problem was that a worker thread in the daemon was panicking while trying to write to stderr, then the parent thread that manages the workers was getting blocked waiting for a channel response from the panicked worker thread. I'm not positive that's exactly what was happening, but either way it is fixed now.

Thanks @geogriff and @Yandros for your help. :slight_smile: :+1:

1 Like

nice. yeah, note that eprintln! is not great to use except in the simplest programs, not only because of panicking on IO error, but because it's not buffered at all, resulting in a ton of write syscalls (even one separate write for the newline at the end). I'd suggest, for logging, probably use the log crate, and have your logger write! to a String buffer, and then stderr.write_all() it all in one shot, ignoring errors.

2 Likes

Sweet. Thanks for the tip.