Stream processing problem
I think the io
interface and Iterator
are 2 of rust's great strengths - to write similar code in a langauge like C is to invite disaster - a small mistake or lack of understanding leads to crashes or remote code execution. I have the following use case, and when I understand how to implement it I think I will understand streams completely.
Use case
My use case is processing the riff
format. This format is used as a container format for things like wav
, amongst others. The format is as follows (shamelessly ripped off from Microsoft WAVE soundfile format). (I am using a simplified version of the format that captures the key features I need to understand).
File format
The format is as follows:
-
4 bytes ascii - the ascii bytes for "RIFF" (in hex:
0x52 0x49 0x46 0x46
) - 4 bytes uint (little-endian) - The length of the stream/file excluding the first field and this field (this means the maximum size is ~ 3GB)
-
4 bytes ascii - the format of the riff file (e.g. "WAVE" in hex:
0x57 0x41 0x56 0x45
)
Followed by 0 or more chunks with the following format
-
4 bytes ascii - the name of the chunk (e.g. "test" in hex:
0x74 0x65 0x73 0x74
) - 4 bytes uint (little-endian) - The length of the chunk content (excluding name or length)
- variable - The content of the chunk
Rust Library Interface
I want to create a rust library with the following characteristics:
- Extract the structure where possible, but don't where not (pass through calls to
Read
) - Be as fast as possible (0 allocations)
- Be easy to use, and easy to compose with other libraries, not making any assumptions about the stream beyond the format above
- Handle errors gracefully
With that in mind, I see the core structs
as follows
struct Riff<R: Read> {
reader: R,
name: [u8; 4],
size: u32,
}
struct RiffChunk<R: Read> {
reader: R,
name: [u8; 4],
size: u32,
}
with the following methods
impl<R: Read> Iterator for Riff<R> {
type Item = RiffChunk<R>;
fn next(&mut self) -> Option<RiffChunk<R>> { ... }
}
impl<R: Read> Read for RiffChunk<R> {
fn read(&mut self, buf: &mut [u8]) -> Result<usize> { ... }
}
so the stream is converted to an iterator of streams. (I'm ignoring details of getting fixed size values, as this is easy (all lives on stack and is small, so just copy
)).
This is the point at which I'm a bit stuck. There will have to be some lifetime restrictions: e.g. no reads from a RiffChunk
after next()
has been called on the parent Riff
. The library will (I assume) want to move
the Read
er into the Riff
struct, and then provide an into_inner
method to recover the raw stream if necessary.
Questions
- How do I implement the above interface? Do I need helper structs (my guess is yes)?
- What should the Error type look like? Ideally layout errors should contain the underlying
Read
er with no data consumed, so a higher level library could choose to do error recovery if they wanted. With IO errors it's probably not possible to recover the stream.
I hope all this makes sense. I'm really excited about using Rust to safely process streaming data without any unnecessary allocations - and it feels like the only barrier is my understanding.