I'm a big fan of arch-linux, but recently I got a bit frustrated having to dig around the docs to find utilities for various tasks (for e.g. checking the sha256 of installed files), and I love programming in rust, so I had a bit of an exploration of the internals of libalpm/pacman, in the hope I could RIIR, and then just easily script what I want in small rust projects, and maybe even make a more fully-featured and easy to use pacman-like tool.
I've tried wrapping the alpm library before, but because the tasks of a package manager are long running, there are lots of passed function pointers for progress reports etc., and I found it hard to reason about how to make a wrapper library safe, so I started work on a fully rust alpm implementation.
The lofty goals I set for the lib were
- Safe, fast, and match behavior of libalpm.
- Offer a richer API taking advantage of rust's type system.
- Cross-platform (especially windows)
If the library could achieve this, then it would be possible to build a system like chocolatey, which isn't my primary motivation for working on it, but glory and all that .
However, it's a big thing to do, and I've discovered that I don't really like some of the internals, and so I'm struggling to maintain motivation. My work so far has been around providing access to the local database of installed packages.
There's loads of complicated behavior here were I had to carefully read the source code to understand exactly what was going on. I'll try to sum it up in bullet points
-
There is a database folder (by default at
/var/lib/pacman/
) where all the package information is stored. -
Within this there are folders for one
local
and zero or moresync
databases, where package information is stored. The sync databases are stored in a compressed format that I haven't really explored yet. -
Inside the local database folder are lots of subfolders - one for each package.
This is my first little gripe: if you want to read the contents of the local database with any significant number of packages installed, then you're going to be making a lot of system calls.
-
Each package is identified by its name and version, both of which are just strings. You find out where to split the name and version by looking for the second-from-last dash
-
(urgh brittle much). -
Each local database entry has 3 files:
desc
,files
, andmtree
.desc
andfiles
use a rough-and-ready format (more on this below) and mtree uses a gzip-compressed mtree format. -
The first 2 files use a kind of config file format, here is an example:
%NAME% zvbi %VERSION% 0.2.35-2 %BASE% zvbi %DESC% VBI capture and decoding library %URL% http://zapping.sourceforge.net/cgi-bin/view/ZVBI/WebHome %ARCH% x86_64 %BUILDDATE% 1528017784 %INSTALLDATE% 1528448772 %PACKAGER% Maxime Gauduin <alucryd@archlinux.org> %SIZE% 1383424 %REASON% 1 %LICENSE% GPL %VALIDATION% pgp %DEPENDS% libpng libx11
It's essentially a key-value store where values can have multiple entries. On of the bits of the work I've done so far is to write a serde implementation for it. The file
desc
contains the package metadata (field names are pretty self-explanatory), which thefiles
file contains a list of the files in the package. -
The last file (
mtree
) is the bit that I have a little gripe with. It is in the mtree file format and lists the files again with extra metadata, like size, checksums, permission bits etc. There may be extra files listed here that are artifacts of the build process, the definitive list is in thefiles
file.This is where I got a bit frustrated. To get metadata for a file so you can see if it is present/the right size/the right checksum, you have to read both the
files
file and themtree
file, and then filter the files in themtree
on the list infiles
. The path representation can be different, so for the comparison you need to normalize paths. Even on my reasonably new i7, reading all packages into memory now takes minutes (I can probably speed it up by just comparing them as strings, but this is brittle and excludes valid paths, which I don't really want to do). There should be one file, in a clearer format (this probably needs to be a binary format so it can deal with the paths, or restrict to utf8 paths, which again I don't really want to do), with all this information in.
This is where I'm up to. I've got some features I like, like lazy-loading the packages (so you don't have the minute+ wait if you are only interested in 1 package). You can already do some pretty cool stuff. I can't implement the Iterator
api to get packages because of the lazy loading functionality, but you can get a Vec of all packages, and since they are all behind Rc this is fairly cheap.
Once you've got your list you can use all the powerful stuff in rust like filtering on a maintainer, printing in groups, or whatever other magic you can think of. I've built a command line tool in examples
with some random examples of what you could do.
If you've read this far, then thanks! Please let me know your thoughts. Is this a sensible project? Is it something I should commit more time to? Anything positive or negative is appreciated.
EDITS
- fixed typo (directory -> database)