The past week or two, I've been working on a project called rpkg
. It's a simple packaging solution that takes advantage of delta encoding to reduce file sizes where multiple versions of the same software are concerned. It's fast, efficient, flexible, and quite simple. For full details, see the linked repository; otherwise, here's a quick example of how it can be useful.
My original impetus for making it was to reduce the amount of wasted space from having many versions of the same software installed (the original culprit was Cargo dependencies, but it applies to any software, including binaries).
Here's an abbreviated example taken from the README of the linked repository. I downloaded the source code of the latest 30 versions of syn
off of crates.io. By default, they come as gzipped tarfiles; together, they total 7.9MiB[1].
I bundled each version with rpkg, which simply consists of adding a little metadata before the tarfile in the gzip archive. I made a box (compressed series of diffs between bundles) providing a diff between each consecutive version, starting at the oldest. To get from the oldest version I downloaded to the newest requires 29 diffs.
The resulting box, together with the "base" (in this case, oldest version) bundle, was only ~879KiB - almost 9x smaller. To get from the older to the newest version (again, 29 diffs), takes (on my old laptop) around 0.75 seconds.
But 30 versions is a lot more than you'll usually encounter. So, I ran the same test but with only the 5 latest versions. I still encountered 4.25x space savings (1.4MiB to 328KiB). Again I tested, this time with only 2 versions, and this time the space was roughly halved (580KiB to 297KiB).
I'm curious if anyone is interested in this project. If anyone has ideas on where this could be useful, I'd be very interested to hear them. The project is still in early stages, but I don't foresee any changes to the core (some manifest updates are the most drastic I can imagine). For example, the CLI is rudimentary, and since the binary and library are the same crate, the library has some extra dependencies.
You might think that putting them all in one archive would save space; I did too! However, when I put all the versions in one archive, it came out the same. I tried both putting the uncompressed directories into a tarfile and compressing it, and putting all the already-compressed files into a tarfile and compressing that. âŠī¸