Api: Taking `R: BufRead` versus taking `R: Read`

Well, okay. I suppose I already know the answer to the question in the title.

If a function requires the functionality of BufRead, then obviously it should take R: BufRead in its public API, rather than taking R: Read and constructing its own BufReader. This is for a couple of reasons:

  • Taking R: BufRead is more inline with the Rust philosophy of having libraries state their requirements (and letting the user decide how they are handled)
  • If my function constructs its own BufReader then it could very well be creating a BufReader<BufReader<_>>.

...yet despite all of that, I still find myself writing functions like:

fn read_a_thing<R: Read>(file: R) -> Result<AThing>
{
    let file = ::std::fs::BufReader::new(file);
    // ... 
}

And I just simply can't bring myself to change this to R: BufRead, and on reflection this is because I find that constructing a BufReader in calling code is, well... really annoying!

  • For every one time I need to write a function that wants a buffered reader, there are half a dozen times I'll want to call it.
  • BufReader lives in std::io, which I virtually always already have an import for in the module where I define these functions (since I usually need it for write!). This is not true in the kinds of places where I use these functions, which typically need nothing but std::fs::File and on rare occasion std::path::PathBuf.
  • It tends to just barely push things over my threshold for what can reasonably fit on a line.
    // awriiight
    let thing = read_a_thing(File::open(path)?)?;
    // aw, hell
    let thing = read_a_thing(BufReader::new(File::open(path)?))?;
    

If I could just call an inherent method on a File, it would be a different story.

What do you think? Are there others who take the "easy way out" like I do? Is there room for better ergonomics here? Or should I just suck it up and start adding use ::std::io::BufReader everywhere?

3 Likes

Here’s one example why you don’t want to be constructing a buffered reader in the subfunctions: Rust Playground

By constructing a BufReader inside the function you’re potentially reading more than what you need to from the passed in Read. In the case where you construct the buffered reader outside the function you can keep it around and pass it to multiple functions that read from it without over reading.

Whether or not this matters in your case I guess depends on whether this function may be used as just part of a read from a single input, or whether it is allowed to just consume its entire input.

6 Likes

There's probably a very obvious reason this isn't done, but wouldn't an impl<R: Read> From<R> for BufReader<R> solve this problem?

I think reading via abstract Read is the right thing to do. I may want to read from another source (like Vec) that doesn't need buffering.

If making a buffered file is annoying, then make a helper function/trait for it.

I own the file crate, so if that's a common need, I can add it there (such as extern crate file; … file::read_buf(path)?)

3 Likes

@Nemo157's point is very worrying! I've gotten away with this because I've only ever used these functions on entire files (and yes, many of the formats simply could not be embedded in a larger stream just due to how they are defined); but it is a habit I absolutely have to break.

If making a buffered file is annoying, then make a helper function/trait for it.

In the one crate in my workspace where I most of my file IO is concentrated, I have open and create helper functions which do error chaining on File::{open,create} to add filenames as context. Certainly for these I could add an open_text helper for getting a BufReader.

The trouble is how far apart all the rest of my different uses are. I have several little CLI stubs and example crates that need to read files. I guess I should somehow just make my helpers available in those crates as well, since currently those crates are not bothering to do the error chaining on File::open either.

2 Likes

If ever there was an indication of a (private?) helper-crate use-case, this is probably it :slight_smile:

I know from my own experience that things like a "helper-crate", a "dependency" and an "import" can feel like very 'heavy' tools for such a seemingly trivial problem.
But "buffering" isn't all that trivial! It's quite hard to get all the edge-cases correct, as for example @Nemo157's playground example shows, and you yourself have noticed the friction everywhere if you don't do it correctly from the start.

I think this 'heavy' feeling is also exaggerated by the still mostly immature tooling.. If just typing "bu" would autocomplete to "BufRead" AND automatically add the use at the top, I think the friction would be a lot less.
If we had "refactor > extract into crate" like we have e.g. in Java, you would have made this helper crate already.

Also, if you come from C or Java backgrounds, where dependency management is still mostly non-existant (as in C), or took literal decades to become somewhat usable (Java/Maven), than the reservations about adding dependencies make total, complete, utter sense: They're a LOT of hassle!

Here probably the npm community has successfully outgrown its reservations, if trivial things like left-pad have their own package.
This really shows that even "trivial" things can be very good candidates for a (helper) package, if the package manager is good enough that the cost of "adding dependencies" is low enough.
I'd definately say cargo is up to the task there :slight_smile:

2 Likes