Reading from pipe via stdin in binary

Implementing a responder for Apache mod_fcgi.

The responder reads binary records from stdin, and writes binary records to stdout. Stdin and stdout are UNIX pipes. I'm getting an EINVAL at the first read.

So I stripped the program down to this:

pub fn main() {
    logger();   // start logging
    let mut outio = std::io::stdout();
    let inio = std::io::stdin();
    let mut instream = inio.lock(); // Lock the stdin for reading.
    // ***TEMP TEST***
    let mut header_bytes:[u8;8] = Default::default();
    use std::io::Read;
    let stat = instream.read_exact(&mut header_bytes);
    log::debug!("Stat: {:?} Bytes: {:?}", stat, header_bytes);
    std::process::exit(0);
    // ***END TEMP***
}

/// Debug logging
fn logger() {
    //  Log file is openly visible as a web page.
    //  Only for debug tests.
    const LOG_FILE_NAME: &str = "logs/echolog.txt";
    let _ = simplelog::CombinedLogger::init(vec![
            simplelog::WriteLogger::new(
                LevelFilter::Debug,
                simplelog::Config::default(),
                std::fs::File::create(LOG_FILE_NAME).expect("Unable to create log file"),
            ),
        ]);
    log::warn!("Logging to {:?}", LOG_FILE_NAME); // where the log is going
}

which gets me, when run under mod_fcgid under Apache on a Linux server,

19:14:55 [WARN] Logging to "logs/echolog.txt"
19:14:55 [DEBUG] (1) echo: Stat: Err(Os { code: 22, kind: InvalidInput, message: "Invalid argument" }) Bytes: [0, 0, 0, 0, 0, 0, 0, 0]

Now, according to Github Copilot, this can occur when non UTF-8 characters come in via stdin. read_exact is a binary read, though. This should work.

Do I need to do something to put "stdin" in "raw mode" here?

EINVAL (Invalid argument) is not about Rust’s read_exact expecting UTF-8 — read_exact is purely binary. It comes from the underlying system call read(2) returning -1 with errno=EINVAL.

In the context of mod_fcgi (and mod_fcgid in particular), Apache sets up pipes for FastCGI requests. For some configurations (especially mod_fcgid), stdin may be empty for the first read if:

  • No request body is sent (i.e., CONTENT_LENGTH=0).

  • The pipe is actually a socketpair instead of a true pipe.

This is probably not a Rust issue.

To avoid this error, you can try the following:

let content_length = std::env::var("CONTENT_LENGTH")
    .ok()
    .and_then(|s| s.parse::<usize>().ok())
    .unwrap_or(0);

let mut stdin = std::io::stdin().lock();

if content_length > 0 {
    let mut body = vec![0u8; content_length];
    stdin.read_exact(&mut body)?;
    log::debug!("Read body: {:?}", body);
} else {
    log::debug!("No stdin content to read");
}

Apache should set this environment variable if there's a request body.

If it doesn't work, for pipes that may signal "EOF immediately", you can do:

use std::io::Read;

let mut stdin = std::io::stdin().lock();
let mut buffer = [0u8; 8];
match stdin.read(&mut buffer) {
    Ok(n) => log::debug!("Read {} bytes: {:?}", n, &buffer[..n]),
    Err(e) => log::error!("Stdin read error: {:?}", e),
}

read returns 0 on EOF instead of panicking. If none of those work, then maybe you should check your mod_fcgid configuration

Raw mode shouldn't be needed in pipes

Very likely.

This module uses AF_UNIX sockets or named pipes, depending on the platform, to communicate with FastCGI applications. This directive specifies the directory where those sockets or named pipes will be created.

So it's usually AF_UNIX sockets on Linux. OK, now to make that work.

I don't get to control the socket binding. Or the Apache configuration, this being shared hosting. My program runs as a subprocess of the Apache server. When my program starts, std::io::stdin() does whatever it does when it encounters a socket. This is similar to what happens when a program runs in a pipeline. (Or do those still use pipes?)

Google's AI, given "rust stdin AF_UNIX socket", has advice on how to handle this from the parent process, but not the child process.

Long, inconclusive Reddit discussion: https://www.reddit.com/r/rust/comments/16mzzmi/why_stdin_is_not_file_in_rust/

Why isn't std::io handling this automatically? In Go, it Just Works.

The read() syscall should work AF_UNIX SOCK_STREAM sockets too.

Having strace output would be nice but since you don't control the box maybe try

debug!("stdin points to {}", fs::read_link("/proc/self/fd/0")?.display())

to see if that's even a pipe or a socket.

If it actually is a socket then maybe it's in a state where it can't receive. In that case turning stdin into a UnixStream - via UnixStream::from(instream.as_fd().try_clone_to_owned()?) - and then calling its read method might provide slightly more informative error codes.

Printing environment variables also might show things of interest, maybe it's not always stdin. FCGI is not the same as CGI.

Edit: Yeah... FCGI spec says

A FastCGI application calls accept() on the socket referred to by file descriptor FCGI_LISTENSOCK_FILENO to accept a new transport connection.

You can get a listening socket via UnixListener::from_raw_fd and then accept on that.

[quote="the8472, post:4, topic:133088"] maybe try

debug!("stdin points to {}", fs::read_link("/proc/self/fd/0")?.display())

to see if that's even a pipe or a socket.

Did that, and also dumped the environment variables:

03:23:44 [WARN] stdin points to socket:[3237692608]
03:23:44 [WARN] Environment: Vars { inner: [("PATH", "/usr/local/bin:/usr/bin:/bin")] }

So it's a socket. (Also, no useful info in the environment.)

Tried binding to the socket, using the name /proc/self/fd/0:

03:52:26 [WARN] stdin points to socket:[3242133963]
03:52:26 [WARN] Environment: Vars { inner: [("PATH", "/usr/local/bin:/usr/bin:/bin")] }
03:52:26 [ERROR] bind function failed: Os { code: 98, kind: AddrInUse, message: "Address already in use" }

Can't bind to it, presumably because it's already bound to stdin. Reasonable.

Can't accept() to the result of sys::io::stdin() because it's not a UnixListener. Also reasonable.

Error[E0599]: no method named `accept` found for struct `Stdin` in the current scope
  --> src/examples/echo/echo.rs:79:33
   |
79 |     let socket = match listener.accept() {
   |                                 ^^^^^^ method not found in `Stdin`

Tried:

    let stdin = std::io::stdin();
    drop(stdin);

But that doesn't help.

I got it to work with this hack:

use std::os::unix::net::UnixListener;
use std::os::fd::FromRawFd;
let mut listener = None;
unsafe { // ***AARGH***
    listener = Some(UnixListener::from_raw_fd(0));
};
let listener = listener.unwrap();
let socket = match listener.accept() {
    Ok((socket, addr)) => {
        log::info!("Got a client: {addr:?}");
        socket }
    Err(e) => {
        log::error!("accept function failed: {e:?}");
        panic!("Can't open");
    }
};
let mut instream = std::io::BufReader::new(socket); 

This is awful. Is there a better way?

Maybe you can try the following:

use std::os::unix::net::UnixStream;
use std::os::fd::FromRawFd;
use std::io::{BufReader, Read};

fn main() {
    unsafe {
        let socket = UnixStream::from_raw_fd(0);

        let mut instream = BufReader::new(socket);

        let mut header = [0u8; 8];
        match instream.read_exact(&mut header) {
            Ok(_) => eprintln!("Got header: {:?}", header),
            Err(e) => eprintln!("Read error: {e:?}"),
        }
    }
}

Still unsafe, but I think it's better to not use accept

OwnedFd and related traits are the philosopher's stone for file-types.

let duped_stdin = stdin.as_fd().try_clone_to_owned()?;
let listener = UnixListener::from(duped_stdin);

Then you can do an accept loop.

Accept is the right thing to do for a listening socket, you can't read from one.

1 Like

Rust expects FD-0, 1 and 2 to always act as stdin, stdout and stderr respectively. While FCGI passes a socket as FD-0 and leaves FD-1 and 2 unallocated. So you'll need to clean up the file descriptors at the start of the program. Otherwise any write to stdout/stderr is a use-after-free vulnerability.

Something like this playground:

fn open_fd(fd: RawFd, path: impl AsRef<Path>) {
    let file = File::open(path).expect("could not open file");
    assert_eq!(file.as_raw_fd(), fd);
    forget(OwnedFd::from(file));
}

fn main() {
    open_fd(1, "/dev/null"); // alternatively open a log file
    open_fd(2, "/dev/null"); // alternatively open a log file
    let socket_fd = unsafe { OwnedFd::from_raw_fd(0) }.try_clone().expect("could not clone FD"); // allocates FD-3 and drops FD-0
    assert_eq!(socket_fd.as_raw_fd(), 3);
    open_fd(0, "/dev/null");
    let socket = UnixListener::from(socket_fd);
}

At this point the state of the program is as you'd expect:

  • FD-0 to 2 are usable as stdin (empty), stdout (ignored) and stderr (ignored). They aren't used/owned by anything else and live until the program terminates.
  • FD-3 is exclusively owned by UnixListener, just like any normal file descriptor

If you want to be extra careful (at the cost of some additional unsafe code), you can use fstat to make sure FD-0 is a socket, and FD-1 to 3 are unallocated. Or use the procfs equivalent, if you prefer.

Using expect in this code is a bit problematic, since it writes to stderr, which might be unallocated or point to something unexpected.

libstd already does ensure that 0-2 are open as part of pre-main startup.

That simplifies things a bit.

This code should work:

use std::fs::File;
use std::io;
use std::io::stdin;
use std::os::fd::{AsFd, AsRawFd};
use std::os::linux::net::SocketAddrExt;
use std::os::unix::net::{SocketAddr, UnixListener};

use nix;
use nix::sys::socket::getpeername;
use nix::unistd::dup2_stdin;

pub fn init_fcgi() -> io::Result<UnixListener> {
    if getpeername::<()>(stdin().as_raw_fd()) != Err(nix::Error::ENOTCONN) {
        return Err(io::Error::other(
            "Not a FastCGI application (FD-0 is not a listener socket)",
        ));
    }
    let file = File::open("/dev/null")?;
    let socket_fd = stdin().as_fd().try_clone_to_owned()?;
    dup2_stdin(file)?; // atomically replace stdin
    Ok(UnixListener::from(socket_fd))
}

fn main() {
    // create dummy listener to test without FastCGI
    dup2_stdin(
        UnixListener::bind_addr(&SocketAddr::from_abstract_name("fcgi-test").unwrap()).unwrap(),
    )
    .unwrap();

    let socket = init_fcgi().unwrap();
    dbg!(socket);
}

Thanks to the nix crate it even avoids unsafe.

Notes for reference, should anyone else ever have to do this:

That init_fcgi setup code does work. This whole thing is overly complex for historical reasons. CGI just used classic UNIX pipes between the server (usually Apache) and the application. Read from stdio, write to stdout.

But some people wanted fancier I/O features, so they could do async, non-blocking reads, and such. So pipes were switched to UNIX sockets, at least for Unix/Linux platforms. Windows is apparently still using pipes. The init_fcgi code won't work on Windows. No idea for MacOS.

But Unix sockets were never intended for parent/child communication. A child can inherit its parent's file descriptors, and setting up pipes to child processes is core UNIX functionality. Named UNIX sockets don't quite work that way. Hence the need for a workaround.

An additional note: should you ever have to write an FCGI responder, the protocol is rather touchy. Although the protocol is built around records with 8-byte binary headers, some old CGI rules still apply. Responder to Apache responses must be almost exactly like this:

1. FcgiHeader { version: 1, rec_type: Stdout, id: 1, content_length: 57, padding_length: 0 } Data: "Status: 200 OK\r\nContent-Type: text/plain; charset=utf-8\n\n"
2. FcgiHeader { version: 1, rec_type: Stdout, id: 1, content_length: 0, padding_length: 0 } Data: ""
3. FcgiHeader { version: 1, rec_type: Stdout, id: 1, content_length: 1837, padding_length: 0 } Data: "Your HTML and content goes here"
4. FcgiHeader { version: 1, rec_type: Stdout, id: 1, content_length: 0, padding_length: 0 } Data: ""
5. FcgiHeader { version: 1, rec_type: EndRequest, id: 1, content_length: 2, padding_length: 0 } Data: "\0\0"

The Status: 200 OK\r\nContent-Type: text/plain; charset=utf-8\n\n part, including the \r\n and \n\n, are parsed very strictly. Any syntax error results in the Apache "Premature end of script headers" message.

Unix sockets work perfectly fine for parent-child communication, we even use that in std. I think what FastCGI is doing is sending a listener socket instead of a connected one is because it doesn't want to send one connection, it wants to send many.
Named unix sockets should also work for that, but maybe there are some permission problems they were trying to work around? Or it just inherited (heh) the pattern from CGI.

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.