Which compression crate should I use?

Michael-F-Bryan · November 4, 2021, 3:31am

I'm revamping the include_dir crate and one feature I'd like to implement is automatic compression of all files that are embedded in your binary.

Are there any pure Rust compression crates you would recommend?

There are a bunch of crates on crates.io, but I'm not sure which one would best suit my goals:

Compress and decompress - there are quite a few crates that only implement compression
Trivial to cross-compile - I don't want my users to mess around with setting up a suitable cross-compiling C toolchain
Low impact on build times - ideally it wouldn't have a many/any dependencies

In terms of performance, I'll be using this compression library to compress files inside a procedural macro and lazily decompress the data once at runtime. That means it's okay for decompression to take a bit longer, but compressing should be pretty quick so builds don't take forever.

zeroexcuses · November 4, 2021, 3:41am

Is there a way to hack the procedural macro so that it checks the dir modification / .bz2 cache, so that:

on changes, it runs tar ... | bzip2 ...
on ".bz2 newer than dir", it just reduces to an include_bytes of the existing cached .bz2 ?

Michael-F-Bryan · November 4, 2021, 4:10am

It depends how hacky we want to be. Procedural macros don't get an $OUT_DIR like crates with a build script, so you would need to stash the tar ball in /tmp or something... I'd prefer to avoid things like that though because it'll add a lot of extra complexity. The entire crate including runtime code and procedural macro is maybe 600 lines in total

I'll probably steer away from libraries which store everything in a single blob (e.g. tar balls and zip archives) because that would require switching a lot of the runtime crate's internals depending on whether a hypothetical compression feature flag is enabled.

For more context, you can think of the include_dir!() macro as generating a literal like this:

Dir {
  path: "",
  children: &[
    DirEntry::Dir(Dir {
      path: "src",
      children: &[
        DirEntry::File(File {
          path: "src/lib.rs",
          contents: include_bytes!("src/lib.rs"),
        }),
      ],
    }),
    DirEntry::File(File {
      path: "README.md",
      contents: include_bytes!("README.md"),
    }),
  ],
}

When I enable the compression feature I want File to store a once_cell::sync::Lazy in the contents field instead of a &[u8]. That way the data is lazily decompressed when you call File's contents() getter.

drewkett · November 4, 2021, 4:43am

I don’t have any experience with the rust libraries. But I would probably go with snap myself based on who the crate author is, it’s a native rust implementation and its a fast algorithm.

hellow · November 4, 2021, 8:15am

Have you heard of zstd?

Only problem is, that it isn't pure rust

Michael-F-Bryan · November 4, 2021, 9:33am

Yeah, that's a bit of an issue for me because it makes cross-compiling a lot harder.

geebee22 · November 5, 2021, 8:06pm

Not sure if I would recommend it, but you may like to consider my (one and only published so far) crate flate3.
It's RFC 1951 compression and de-compression. I think it works pretty well, I believe it out-performs flate2 and is pure Rust.

mbrubeck · November 5, 2021, 8:46pm

Another possible option: The brotli crate is implemented in pure safe Rust with very few dependencies. It is developed and used by Dropbox; some development details here.

zeroexcuses · November 5, 2021, 11:27pm

What is going on with:

https://github.com/dropbox/rust-brotli/blob/master/src/enc/util.rs
https://github.com/dropbox/rust-brotli/blob/master/src/enc/static_dict_lut.rs

If I am reading src/enc correctly, there is more than 2MB worth hard coded data ?

Michael-F-Bryan · November 5, 2021, 11:33pm

My understanding is that the brotli crate is an almost direct port of their C++ library to Rust so that'll explain why the code looks weird.

I've got no idea what's up with that float64 feature - it feels a lot like the typedefs every C library uses to name their own number types because people don't know about stdint.h. Switching primitives with a feature flag also sounds like a great way to silently break downstream users...

They probably calculated lookup tables ahead of time. You see that stuff all the time where the author is trading space for performance.

mbrubeck · November 5, 2021, 11:37pm

Brotli uses a pre-defined dictionary to aid in text compression. (Most similar compression algorithms build such a dictionary on the fly based on the input stream. Using a pre-populated dictionary achieves greater compression in many common cases.)

zeroexcuses · November 6, 2021, 4:35am

Quoting that wikipedia article:

Unlike most general purpose compression algorithms, Brotli uses a pre-defined dictionary, roughly 120 KiB in size, in addition to the dynamically populated ("sliding window") dictionary. The pre-defined dictionary contains over 13000 common words, phrases and other substrings derived from a large corpus of text and HTML documents.[7][3] Using a pre-defined dictionary has been shown to increase compression where a file mostly contains commonly used words.[8]

So they took a crawl of the internet, figured out the optimal dictionary for that crawl, then hard coded it into the enc/dec, so that compressed files can use this dictionary without including it as part of the compressed file ? Interesting decision.

jessa0 · November 6, 2021, 5:51am

Well, Brotli was primarily designed for Google to compress its own traffic, so... it makes sense for them.

system · February 4, 2022, 5:52am

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.

Topic		Replies	Views
Good tips for compression and decompression	9	517	March 23, 2023
Use Bzip2 or Deflate Compression For Zip File Included in Binary	4	749	March 21, 2020
Compression at compiletime	6	2471	January 12, 2023
Announcing the include_dir!() proc macro announcements	4	1123	January 12, 2023
First crate: ansi colors at compile-time code review	6	776	November 25, 2021

Which compression crate should I use?

Related Topics