When transforming data, how do I decide between implementing Read or Write

When I write a type that transforms data, I would intuitively implement Write and also take a Write.

For example here's a transformer that converts some letters to uppercase. This example could represent encryption or compression.

Usage is:

let stdout = std::io::stdout();
let mut uppercase_stdout = UppercaseWriter::new(stdout);
writeln!(uppercase_stdout, "hello world")?;

But then I thought, in practice the input is probably always something like a file, so one would have to use std::io::copy() to send the file through the transformer. It could also be the body of an HTTP request which is also a Read, so again we would need std::io::copy().

So would it be more appropriate to take a Read and read from that?

With the output it's similar... instead of writing to a Write I could implement Read and let users read from my transformer. If I take a Read as input it almost seems mandatory to make the output a Read as well.

So at the very least I have to decide between "be Write and take Write" and "take Read and be Read".

Any insights?

If you want to use Read and Write, you will need to handle both separately to cover both use cases. The general way to create "streaming middleware" (AFAIK) is Iterator. Where your codec can transform a stream of bytes (via Read) into an iterator over frames (either raw data, or processed). Iterator adapters are a natural fit for transformations.

When dealing with async, you would use something like tokio_util::codec - Rust (docs.rs) which uses this basic model that I've attempted to describe. If nothing else, maybe the documentation will help guide your design, even if you stick with synchronous I/O.

You don't have to. Provide both. flate2 does that.

2 Likes

For a library crate that makes sense, as part of an application less so.

If you're just going to be using it with std::io::copy() or some similar case where you are already reading the input from a Read and writing the output to a Write, then it really doesn't matter; the code is going to look pretty much the same either way.

It only really matters if you are going to have uses that are only using one side of the IO - e.g. if a caller wants to transform data from an underlying Read but then does something with the resulting buffers that doesn't involve Write then that would be quite awkward to use with a Write-based transformer, as the code would have to use a Vec<u8> or similar as the destination and call std::io::copy() purely to have the transformation happen.

So, for libraries it's quite helpful to provide both, but if this is just your own code using it then pick whichever seems simpler, and if there's no obvious choice then just pick arbitrarily and don't worry about it.

I'm very interested in the Iterator solution. It seems perfect, especially since the Read API is really quite complex with all the conditions, like dealing with short reads, not allowed to return an error when bytes have been read, etc. Iterating byte by byte seems like a much more straightforward solution.

What does it look like? Can you point me to an example where an iterator is (or can be) used to transform a bytes stream like from a file?

Take a look at the tokio_util::codec link I posted earlier. It's pretty much that but replace (obviously) AsyncRead and AsyncWrite with Read and Write, but also (maybe less obviously) replace Stream with Iterator.

In other words (hopefully not taking too much out of the original context) a quote from Yoshua Wyuts's blog:

In synchronous Rust, the core streaming abstraction is that of Iterator.
...
In asynchronous Rust the core streaming abstraction is Stream.

I don't think there is too much currently written about using Iterator in synchronous I/O situations. In the case of reading from a file, for instance, your iterator would (probably but not always) only contain a single element, so it isn't terribly useful as an abstraction in itself. But for TcpStream, iterating over frames starts to make more sense.

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.