I'm revamping the include_dir crate and one feature I'd like to implement is automatic compression of all files that are embedded in your binary.
Are there any pure Rust compression crates you would recommend?
There are a bunch of crates on crates.io, but I'm not sure which one would best suit my goals:
Compress and decompress - there are quite a few crates that only implement compression
Trivial to cross-compile - I don't want my users to mess around with setting up a suitable cross-compiling C toolchain
Low impact on build times - ideally it wouldn't have a many/any dependencies
In terms of performance, I'll be using this compression library to compress files inside a procedural macro and lazily decompress the data once at runtime. That means it's okay for decompression to take a bit longer, but compressing should be pretty quick so builds don't take forever.
It depends how hacky we want to be. Procedural macros don't get an $OUT_DIR like crates with a build script, so you would need to stash the tar ball in /tmp or something... I'd prefer to avoid things like that though because it'll add a lot of extra complexity. The entire crate including runtime code and procedural macro is maybe 600 lines in total
I'll probably steer away from libraries which store everything in a single blob (e.g. tar balls and zip archives) because that would require switching a lot of the runtime crate's internals depending on whether a hypothetical compression feature flag is enabled.
For more context, you can think of the include_dir!() macro as generating a literal like this:
When I enable the compression feature I want File to store a once_cell::sync::Lazy in the contents field instead of a &[u8]. That way the data is lazily decompressed when you call File's contents() getter.
I don’t have any experience with the rust libraries. But I would probably go with snap myself based on who the crate author is, it’s a native rust implementation and its a fast algorithm.
Not sure if I would recommend it, but you may like to consider my (one and only published so far) crate flate3.
It's RFC 1951 compression and de-compression. I think it works pretty well, I believe it out-performs flate2 and is pure Rust.
Another possible option: The brotli crate is implemented in pure safe Rust with very few dependencies. It is developed and used by Dropbox; some development details here.
My understanding is that the brotli crate is an almost direct port of their C++ library to Rust so that'll explain why the code looks weird.
I've got no idea what's up with that float64 feature - it feels a lot like the typedefs every C library uses to name their own number types because people don't know about stdint.h. Switching primitives with a feature flag also sounds like a great way to silently break downstream users...
They probably calculated lookup tables ahead of time. You see that stuff all the time where the author is trading space for performance.
Brotli uses a pre-defined dictionary to aid in text compression. (Most similar compression algorithms build such a dictionary on the fly based on the input stream. Using a pre-populated dictionary achieves greater compression in many common cases.)
Unlike most general purpose compression algorithms, Brotli uses a pre-defined dictionary, roughly 120 KiB in size, in addition to the dynamically populated ("sliding window") dictionary. The pre-defined dictionary contains over 13000 common words, phrases and other substrings derived from a large corpus of text and HTML documents.[7][3] Using a pre-defined dictionary has been shown to increase compression where a file mostly contains commonly used words.[8]
So they took a crawl of the internet, figured out the optimal dictionary for that crawl, then hard coded it into the enc/dec, so that compressed files can use this dictionary without including it as part of the compressed file ? Interesting decision.