Gzp: a multi-threaded compression crate

gzp is a multi-threaded compression crate on top of flate2 and tokio. It implements write and is a near drop in anywhere a writer is used.

It works similar to pigz in that the output is a concatenation of gzipped blocks. It does not keep a running CRC value as pigz does, and it does not attempt to use information from previous blocks for compression (which could lead to less effective compression depending on the input data).

3 Likes

Nice work!
I do have a question: do you have benchmarks relative to just using flate2 directly? The thing that would make me use it over alternatives is if it's faster while otherwise retaining feature parity.

Or is its selling point more that it's written using async style in the first place?

Great question! I'll run some quick benchmarks. It does not expose an async interface and is meant to be used from synchronous code. Tokio just provides a nice threadpool implementation that this operates on top of.

The short answer is that it is much faster even on small 5MB datasets: GitHub - sstadick/gzp: Multi-threaded Compression

Only using a single thread there is already a speedup just from offloading the writing to a different thread. At the far end of the number of threads that I tested (12) gzp is about 6x faster than just flate2.

These benchmarks are pretty crude, and I think with larger inputs / more threads an even larger speedup could be seen.

1 Like

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.