BufReader type, subtype, generics

This is probably very basic but it's stymied me so far.

I am writing something that will read several large line-based logfiles and process them in order of line timestamp—so I want to have a bunch of them open in parallel, reading one line off this, one line off that. Some of these logfiles may be gz- or xz-compressed. What I ideally want to do is to take the filename, send it to an opener function based on type, and get a BufReader back.

I can use File::open and I can use Command::new (to run xzcat or zcat), but they return different types (a File or a ChildStdout). I can in principle feed both of those into a BufReader::new, but I get different BufReader types out as a result. (Yes, I know there are compression libraries. That may be part of the solution.)

Even if I use plain old cat rather than File::open so that I'm only dealing with one one type BufReader<ChildStdout>, I still can't pass it as a parameter to a fn getnextline — the compiler says I need something that looks like fn getnextline<R>(b: BufReader<R>), so I do that, but then it turns out that b doesn't have a read_line method.

let l = b.read_line(&mut line);
           ^^^^^^^^^ method not found in `BufReader<ChildStdout>`

(Ultimately I'm going to want to put all these bufreaders in a vec, so…. do I need some kind of composite type which can be any BufReader? A custom struct or enum?)

(All code samples in this answer assume use std::io;)

I can in principle feed both of those into a BufReader::new , but I get different BufReader types out as a result.

… (Ultimately I'm going to want to put all these bufreaders in a vec, so…. do I need some kind of composite type which can be any BufReader? A custom struct or enum?)

If you need a single type, use a trait object:

let b: Box<dyn io::BufRead> = Box::new(BufReader::new(file));

std::io::BufRead is the trait which has the read_line() method, and is implemented for all properly formed BufReaders.

The problem is that you left out the fact that R must implement io::Read or the io::BufReader is useless. Whenever you write a generic function, you must write trait bounds describing everything it needs. This would work:

fn get_next_line<R: io::Read>(b: &mut io::BufReader<R>)

(It's a separate problem, but you'll almost certainly want to pass the reader by &mut so you don't discard it after the first call.)

However, putting it together with my above suggestion, you'll want a signature with io::BufRead, which doesn't need any generic or bound:

fn get_next_line(b: &mut dyn io::BufRead)

Or you could make it more generic, able to work with concrete io::BufReaders or dyn io::BufRead:

fn get_next_line<R: ?Sized + io::BufRead>(b: &mut R)

But if your application sticks to Box<dyn io::BufRead> then there is no reason to do that.

4 Likes

Thanks!

I'm definitely making progress now.

b = BufReader::new(gzopen(&filename));

gives me something I can feed to

fn getnextline(b: &mut dyn BufRead) -> (i64, String)  {

and that's working

However (a) my file reader switching isn't working. I ideally want this:

let mut b;
if XZ.is_match(&filename) {
    b = BufReader::new(xzopen(&filename));
} else if GZ.is_match(&filename) {
    b = BufReader::new(gzopen(&filename));
} else {
    b = BufReader::new(File::open(filename).unwrap());
}

but the compiler says "expected ChildStdout, found File" on the last one. Well, yes, that's what the first two limbs give it, for example:

fn gzopen(file: &str) -> std::process::ChildStdout {
    let gz=Command::new("zcat")
        .arg(file)
        .stdout(Stdio::piped())
        .spawn()
        .expect("Failed to start zcat process");
    gz.stdout.expect("unknown failure")
}

so I guess I need to declare b as dyn in some way?

Similarly (b) how can I declare a struct field that will hold b? Yeah, I can push it onto a Vec, but only by not declaring its content type.)

(The basic idea is that this all feeds into a BTreeMap of timestamp to a vec of (next line, reader object) tuples. Then I pull the entry with the lowest timestamp, process each of those next lines, refresh each reader with the next line, and stick those in as new entries.)

If you have any recommendations for reading on this subject (which I'm guessing is "Traits", but clearly at a pretty basic level, and I don't already speak C/C++), I'd welcome it.

If you want to store it, you probably want to own it, and it'd be something like

let mut b: Box<dyn BufRead> = if XZ.is_match(&filename) {
    Box::new(BufReader::new(xzopen(&filename)))
} else if ... {

If you drop it at the end of the function you could avoid the boxing and use conditional initialization.

let (mut xz, mut gz, mut plain);
let b: &mut dyn BufRead = if XZ.is_match(&filename) {
    xz = BufReader::new(xzopen(&filename));
    &mut xz
} else if ... {

It's not really aimed at "getting started fast" questions like how to return a trait object from an if-else chain, but you can find my tour of dyn Trait here. (It's also focused on dyn Trait and not traits and generics more generally, clearly.)

3 Likes

OK, it looks as though I need to go back to the Rust book and finally learn about how and when to Box<T>, which I've managed to avoid so far. Thanks for both your help; I've marked this one "solved" because my initial question is indeed answered, and now I have some idea of what I need to go away and learn.

If the code is unix-specific you can turn a ChildStdout into a File via File::from(OwnedFd::from(child_stdout))

By the way—you may find itertools::kmerge useful.

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.