Half-baked alpm (arch linux package manager) implementation

I'm a big fan of arch-linux, but recently I got a bit frustrated having to dig around the docs to find utilities for various tasks (for e.g. checking the sha256 of installed files), and I love programming in rust, so I had a bit of an exploration of the internals of libalpm/pacman, in the hope I could RIIR, and then just easily script what I want in small rust projects, and maybe even make a more fully-featured and easy to use pacman-like tool.

I've tried wrapping the alpm library before, but because the tasks of a package manager are long running, there are lots of passed function pointers for progress reports etc., and I found it hard to reason about how to make a wrapper library safe, so I started work on a fully rust alpm implementation.

The lofty goals I set for the lib were

  • Safe, fast, and match behavior of libalpm.
  • Offer a richer API taking advantage of rust's type system.
  • Cross-platform (especially windows)

If the library could achieve this, then it would be possible to build a system like chocolatey, which isn't my primary motivation for working on it, but glory and all that :stuck_out_tongue:.

However, it's a big thing to do, and I've discovered that I don't really like some of the internals, and so I'm struggling to maintain motivation. My work so far has been around providing access to the local database of installed packages.

There's loads of complicated behavior here were I had to carefully read the source code to understand exactly what was going on. I'll try to sum it up in bullet points

  • There is a database folder (by default at /var/lib/pacman/) where all the package information is stored.

  • Within this there are folders for one local and zero or more sync databases, where package information is stored. The sync databases are stored in a compressed format that I haven't really explored yet.

  • Inside the local database folder are lots of subfolders - one for each package.

    This is my first little gripe: if you want to read the contents of the local database with any significant number of packages installed, then you're going to be making a lot of system calls.

  • Each package is identified by its name and version, both of which are just strings. You find out where to split the name and version by looking for the second-from-last dash - (urgh brittle much).

  • Each local database entry has 3 files: desc, files, and mtree. desc and files use a rough-and-ready format (more on this below) and mtree uses a gzip-compressed mtree format.

  • The first 2 files use a kind of config file format, here is an example:

    %NAME%
    zvbi
    
    %VERSION%
    0.2.35-2
    
    %BASE%
    zvbi
    
    %DESC%
    VBI capture and decoding library
    
    %URL%
    http://zapping.sourceforge.net/cgi-bin/view/ZVBI/WebHome
    
    %ARCH%
    x86_64
    
    %BUILDDATE%
    1528017784
    
    %INSTALLDATE%
    1528448772
    
    %PACKAGER%
    Maxime Gauduin <alucryd@archlinux.org>
    
    %SIZE%
    1383424
    
    %REASON%
    1
    
    %LICENSE%
    GPL
    
    %VALIDATION%
    pgp
    
    %DEPENDS%
    libpng
    libx11
    

    It's essentially a key-value store where values can have multiple entries. On of the bits of the work I've done so far is to write a serde implementation for it. The file desc contains the package metadata (field names are pretty self-explanatory), which the files file contains a list of the files in the package.

  • The last file (mtree) is the bit that I have a little gripe with. It is in the mtree file format and lists the files again with extra metadata, like size, checksums, permission bits etc. There may be extra files listed here that are artifacts of the build process, the definitive list is in the files file.

    This is where I got a bit frustrated. To get metadata for a file so you can see if it is present/the right size/the right checksum, you have to read both the files file and the mtree file, and then filter the files in the mtree on the list in files. The path representation can be different, so for the comparison you need to normalize paths. Even on my reasonably new i7, reading all packages into memory now takes minutes (I can probably speed it up by just comparing them as strings, but this is brittle and excludes valid paths, which I don't really want to do). There should be one file, in a clearer format (this probably needs to be a binary format so it can deal with the paths, or restrict to utf8 paths, which again I don't really want to do), with all this information in.

This is where I'm up to. I've got some features I like, like lazy-loading the packages (so you don't have the minute+ wait if you are only interested in 1 package). You can already do some pretty cool stuff. I can't implement the Iterator api to get packages because of the lazy loading functionality, but you can get a Vec of all packages, and since they are all behind Rc this is fairly cheap.

Once you've got your list you can use all the powerful stuff in rust like filtering on a maintainer, printing in groups, or whatever other magic you can think of. I've built a command line tool in examples with some random examples of what you could do.

If you've read this far, then thanks! Please let me know your thoughts. Is this a sensible project? Is it something I should commit more time to? Anything positive or negative is appreciated.

EDITS

  • fixed typo (directory -> database)
5 Likes

I honestly have no idea if this is a good way to use your time, since it's pretty much a solved problem?

Furthermore, I think doing the various tasks bit is more of a devops issue and there is a wealth of tools for that.

But without knowing what are your anticipated use cases in more details, I don't think I can reach any meaningful conclusions. Either way, if the produced library/binary is only usable for archlinux style setup, then I don't think spending a lot of time on it is a very good idea.

Perhaps your newly-acquired knowledge could be useful for extending Cargo? There's a need to make cargo install smarter than just dumping executables in the bin folder, but it's not clear how to go from that to cargo being a robust package manager.

https://github.com/rust-lang/rfcs/pull/2376

Personally, I would prefer to see better integration between cargo and system package managers over attempts to make cargo a better system package manager.

The Unix world already has too many distro- and language-specific package managers stepping on each other's toes for my taste.

1 Like

Me too, but that doesn't seem to be the consens in that github issue.

1 Like