Streaming iterator for decompressing data

Hey, I have been working wirth compressed formats (parquet, avro, etc), and I noticed a common pattern when consuming data from these formats.

I am trying to systematize the approach on a small crate, streaming-decompressor/lib.rs at main · jorgecarleitao/streaming-decompressor · GitHub, but I would really appreciate someone's feedback about it, as someone may have solved this in a better way. The intro to the problem is in the associated .md doc: streaming-decompressor/lib.md at main · jorgecarleitao/streaming-decompressor · GitHub

1 Like

I took a very quick look at the code and document, and I think some more documentation of the role each type and trait plays — not just a noun phrase that describes it, but an explanation of how it interacts with the other parts of the system or how to use it — could help people understand what this mechanism is doing.

In particular, I was baffled by Compressed::is_compressed — what does it mean for a Compressed to “not be compressed”? Does it mean “hasn't been decompressed yet”? If so, what effect does that actually have on code using an implementor of the trait, particularly since that trait has no other methods? What would happen if the wrong value was returned, and can that possibility be eliminated by using an Option somewhere else?