I want to first read the first few bytes of the input, then read the whole input, including the beginning. Specifically, I want to check whether the data has already been compressed using infer::get before compressing it using zopfli::compress. To perform this check, I want to read the first 262 bytes of the data before compressing it.
The following works for a file, but doesn't work for standard input because it cannot seek:
use std::{
fs::{self, File},
io::{self, Read, Seek},
};
enum Input {
File(fs::File),
Stdin(io::Stdin),
}
impl Read for Input {
fn read(&mut self, buf: &mut [u8]) -> io::Result<usize> {
match *self {
Self::File(ref mut file) => file.read(buf),
Self::Stdin(ref mut stdin) => stdin.read(buf),
}
}
}
fn main() {
let mut input = if let Ok(file) = File::open("foo.txt") {
Input::File(file)
} else {
Input::Stdin(io::stdin())
};
let mut buf_first_4 = [0; 4];
// Read first 4 bytes.
input.read_exact(&mut buf_first_4).unwrap();
if let Input::File(ref mut file) = input {
file.rewind().unwrap();
}
let mut buf_all = Vec::new();
// Read first 8 bytes.
input.read_to_end(&mut buf_all).unwrap();
assert_eq!(buf_first_4.as_slice(), &buf_all[..4]);
}
To achieve the purpose, do I need to use Vec<u8> like in the following code?
use std::{
fs,
io::{self, Read},
};
fn main() {
let input = if let Ok(data) = fs::read("foo.txt") {
data
} else {
let mut buf = Vec::new();
io::stdin().read_to_end(&mut buf).unwrap();
buf
};
let mut buf_first_4 = [0; 4];
// Read first 4 bytes.
input.as_slice().read_exact(&mut buf_first_4).unwrap();
let mut buf_all = Vec::new();
// Read all bytes.
input.as_slice().read_to_end(&mut buf_all).unwrap();
assert_eq!(buf_first_4.as_slice(), &buf_all[..4]);
}
It's more robust in the sense of definitely having a large enough buffer available, but fill_buf doesn't necessarily fill the entire buffer. (Try sending a line to this program for example.)
I don't think there is a std API which, when used as intended, actually fulfills the use case. In addition to the trait, BufReader has a seek_relative method,[1] but some testing demonstrated that the buffer can be cleared in the process of fulfilling a read_exact call that takes multiple reads.
for R: Seek, but Stdin can be converted to a File via the FD types ↩︎
that should only be a problem if stdin is in line-buffered mode, otherwise a single read should return more than enough bytes (assuming i remember how linux pipes work, i'm more familiar with plan9 pipes)
although, it would be nice to have a peek method that tries to look ahead the specified number of bytes, up to a maximum of capacity.
hmm, i'm tempted to draft a PR, trying to remember if a new unstable function requires an RFC, or if it just needs an RFC for stabilization... maybe it still needs an ACP?
Please absolutely do NOT do this. Relying on such brittle behavior will cause nothing but endless frustration for yourself and your users. This sort of thing is the archetypal example of a hard-to-debug issue.
Since io::Read has a chain() method, this is trivial to do correctly, without relying on implementation details, environmental settings, or needing to read the entire file into memory: Playground