When I write a type that transforms data, I would intuitively implement
Write and also take a
For example here's a transformer that converts some letters to uppercase. This example could represent encryption or compression.
let stdout = std::io::stdout();
let mut uppercase_stdout = UppercaseWriter::new(stdout);
writeln!(uppercase_stdout, "hello world")?;
But then I thought, in practice the input is probably always something like a file, so one would have to use
std::io::copy() to send the file through the transformer. It could also be the body of an HTTP request which is also a
Read, so again we would need
So would it be more appropriate to take a
Read and read from that?
With the output it's similar... instead of writing to a
Write I could implement
Read and let users read from my transformer. If I take a
Read as input it almost seems mandatory to make the output a
Read as well.
So at the very least I have to decide between "be
Write and take
Write" and "take
Read and be
If you want to use
Write, you will need to handle both separately to cover both use cases. The general way to create "streaming middleware" (AFAIK) is
Iterator. Where your codec can transform a stream of bytes (via
Read) into an iterator over frames (either raw data, or processed). Iterator adapters are a natural fit for transformations.
When dealing with async, you would use something like tokio_util::codec - Rust (docs.rs) which uses this basic model that I've attempted to describe. If nothing else, maybe the documentation will help guide your design, even if you stick with synchronous I/O.
You don't have to. Provide both.
flate2 does that.
For a library crate that makes sense, as part of an application less so.
If you're just going to be using it with
std::io::copy() or some similar case where you are already reading the input from a
Read and writing the output to a
Write, then it really doesn't matter; the code is going to look pretty much the same either way.
It only really matters if you are going to have uses that are only using one side of the IO - e.g. if a caller wants to transform data from an underlying
Read but then does something with the resulting buffers that doesn't involve
Write then that would be quite awkward to use with a
Write-based transformer, as the code would have to use a
Vec<u8> or similar as the destination and call
std::io::copy() purely to have the transformation happen.
So, for libraries it's quite helpful to provide both, but if this is just your own code using it then pick whichever seems simpler, and if there's no obvious choice then just pick arbitrarily and don't worry about it.
I'm very interested in the
Iterator solution. It seems perfect, especially since the
Read API is really quite complex with all the conditions, like dealing with short reads, not allowed to return an error when bytes have been read, etc. Iterating byte by byte seems like a much more straightforward solution.
What does it look like? Can you point me to an example where an iterator is (or can be) used to transform a bytes stream like from a file?
Take a look at the
tokio_util::codec link I posted earlier. It's pretty much that but replace (obviously)
Write, but also (maybe less obviously) replace
In other words (hopefully not taking too much out of the original context) a quote from Yoshua Wyuts's blog:
In synchronous Rust, the core streaming abstraction is that of
In asynchronous Rust the core streaming abstraction is
I don't think there is too much currently written about using
Iterator in synchronous I/O situations. In the case of reading from a file, for instance, your iterator would (probably but not always) only contain a single element, so it isn't terribly useful as an abstraction in itself. But for
TcpStream, iterating over frames starts to make more sense.
This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.