How to parallelize compression

I wrote a compression CLI tool. It uses xz2 for compression, which is single-threaded.

The performance of the tool is terrible with slightly larger files, so I wonder how to compress the data in parallel using the same compression algorithm.

The xz2 crate is a layer of encapsulation of lzma-sys.

But I have no programming experience with multithreading or data parallelism. (I have only used data-parallel libraries like rayon as an API caller)

Could you give me some advice? (Including prerequisite knowledge and implementation ideas to achieve the goal)

1 Like

Whether it is parallelizable or not depends heavily on the algorithm. I would suggest you read up on the underlying algorithm (LZMA algorithm, I believe).
After that, you can figure out how you can parallelize the algorithm (there would probably be well known ways already, its such a popular algorithm).
After that, its a simple matter of implementation - if you have a fixed upper bound on the number of threads you will spawn, you can used std::thread or scoped threads (from crossbeam perhaps). If you need to spawn a number of tasks where the bound isn't "well-known", then you can use a threadpool. Rayon provides one (check the docs).

1 Like

From my very limited experience with general-purpose compression algorithms I can tell you that they often compress data in blocks, since their algorithms are not necessarily linear and thus would result in low performance and high RAM usage on larger datasets.

So one trick you could do is to figure out what the default block size of your underlying algorithm is (or well, use that as a default but let the user overwrite it if you want to write a high-quality tool), and then pass blocks of that size to separate threads. I have written something like that with crossbeam scoped threads for some domain-specific lossy compression not too long ago, here is the code: GitHub - sebschmi/homopolymer-compress-rs: Homopolymer compression of genomic data.

Maybe that helps :slight_smile:

4 Likes

In this case the lzma-sys library already does the hard part for you. There's a multi-threaded compression function:

Without this the general approach is to split the input data into X parts, start X threads compressing each part, and then concatenate the results by appropriately modifying the bitstream (for some algorithms that's as easy as basic concatenation, for others it may require rewriting data with knowledge of algorithm's specifics).

In rayon slice.chunks() will work for in-memory data. If you need to do more complex loading and threading, you'll need to spawn tasks and use crossbeam channels.

4 Likes

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.