Is it possible to read data from both the beginning of a file and standard input multiple times?

I want to first read the first few bytes of the input, then read the whole input, including the beginning. Specifically, I want to check whether the data has already been compressed using infer::get before compressing it using zopfli::compress. To perform this check, I want to read the first 262 bytes of the data before compressing it.

The following works for a file, but doesn't work for standard input because it cannot seek:

use std::{
    fs::{self, File},
    io::{self, Read, Seek},
};

enum Input {
    File(fs::File),
    Stdin(io::Stdin),
}

impl Read for Input {
    fn read(&mut self, buf: &mut [u8]) -> io::Result<usize> {
        match *self {
            Self::File(ref mut file) => file.read(buf),
            Self::Stdin(ref mut stdin) => stdin.read(buf),
        }
    }
}

fn main() {
    let mut input = if let Ok(file) = File::open("foo.txt") {
        Input::File(file)
    } else {
        Input::Stdin(io::stdin())
    };

    let mut buf_first_4 = [0; 4];
    // Read first 4 bytes.
    input.read_exact(&mut buf_first_4).unwrap();
    if let Input::File(ref mut file) = input {
        file.rewind().unwrap();
    }

    let mut buf_all = Vec::new();
    // Read first 8 bytes.
    input.read_to_end(&mut buf_all).unwrap();
    assert_eq!(buf_first_4.as_slice(), &buf_all[..4]);
}

To achieve the purpose, do I need to use Vec<u8> like in the following code?

use std::{
    fs,
    io::{self, Read},
};

fn main() {
    let input = if let Ok(data) = fs::read("foo.txt") {
        data
    } else {
        let mut buf = Vec::new();
        io::stdin().read_to_end(&mut buf).unwrap();
        buf
    };

    let mut buf_first_4 = [0; 4];
    // Read first 4 bytes.
    input.as_slice().read_exact(&mut buf_first_4).unwrap();

    let mut buf_all = Vec::new();
    // Read all bytes.
    input.as_slice().read_to_end(&mut buf_all).unwrap();
    assert_eq!(buf_first_4.as_slice(), &buf_all[..4]);
}

Maybe you want Cursor in std::io - Rust ?

You might be able to get away with using the BufRead buffer (especially if the header is small). A more robust approach would be to manage a buffer yourself.

1 Like

I don't think Cursor is suitable for this purpose because it changes the current position.

a more robust but still fairly simple solution is to use BufReader::with_capacity along with fill_buf.

It's more robust in the sense of definitely having a large enough buffer available, but fill_buf doesn't necessarily fill the entire buffer. (Try sending a line to this program for example.)

I don't think there is a std API which, when used as intended, actually fulfills the use case. In addition to the trait, BufReader has a seek_relative method,[1] but some testing demonstrated that the buffer can be cleared in the process of fulfilling a read_exact call that takes multiple reads.


  1. for R: Seek, but Stdin can be converted to a File via the FD types ↩︎

2 Likes

that should only be a problem if stdin is in line-buffered mode, otherwise a single read should return more than enough bytes (assuming i remember how linux pipes work, i'm more familiar with plan9 pipes)

although, it would be nice to have a peek method that tries to look ahead the specified number of bytes, up to a maximum of capacity.

hmm, i'm tempted to draft a PR, trying to remember if a new unstable function requires an RFC, or if it just needs an RFC for stabilization... maybe it still needs an ACP?

Or in other words, you might be able to get away with it but it's not robust.

Anyway, I think we're in agreement.

PR or ACP apparently, though I'm pretty sure I've seen a request to create an ACP on PRs, so probably I would start with the ACP.

1 Like

:+1:

implementation has been merged.

2 Likes

I'd love to here the context for that statement. Do you daily drive Plan9? Is there a rust port for it even?

Which it often is.

Please absolutely do NOT do this. Relying on such brittle behavior will cause nothing but endless frustration for yourself and your users. This sort of thing is the archetypal example of a hard-to-debug issue.

Since io::Read has a chain() method, this is trivial to do correctly, without relying on implementation details, environmental settings, or needing to read the entire file into memory: Playground

1 Like

i used to daily drive plan9, and did a lot of direct syscall programming

now i use linux, where the direct syscalls are less nice, so i use abstractions, like rust

1 Like

yes, but not when you are reading a binary file from a pipe.