Would it be theoretically possible to run a Cargo Registry without a database?

I'd benefit tremendously from having a private Cargo registry on the cheap. Most solutions I have found require either a live local filesystem or a database such as Postgres.

Since this is for personal use, I'm trying to find a solution which is serverless and without the need for local filesystem persistence. A couple ideas come to mind:

  1. I could build an AWS lambda function with API Gateway to provide a serverless http service, and store data in S3, and wrap an API key system around it for restricting access.
  2. If a database is explicitly required, I could use DynamoDB, which is pay-as-you-go.
  3. If local filesystem persistence is absolutely needed, I could use an EFS shared filesystem, which IIRC is pay as you go.
  4. Although it lacks official support for Rust, AWS CodeArtifact may be an option, as you can store arbitrary tarballs, but I am not sure what actual compute is needed for the registry.

I don't think such a thing exists and I may consider actually building it. The main question I have is whether this is even theoretically possible in the first place. Hopefully some of you know more about how the registry works.

I've built a small registry myself a while ago (might want to revisit it though...), which doesn't use the database, only filesystem. It's probably possible to do without filesystem, but it'll be highly problematic, since Cargo registry must expose a Git repository as index, and I'm not sure whether it's feasible to do this without storing this repository on disk.

4 Likes

Thank you, you are a saint :pray:

I guess now I need to research Git and whether it could be done for blob storage. If it can, it's likely somebody already has done this, but I probably would have heard about it. If filesystem is mandatory, that's totally fine, AWS API Gateway to Lambda in a VPC with an attached EFS mount would do the trick in that case. Seems like a lot of infrastructure for a registry, but it is what it is, I'd rather pay $0 per month if I'm not using it than have fixed costs on it.

IIRC running a private registry only became an option fairly recently within the last few years or so. Hopefully AWS will add a Cargo registry to CodeArtifact, that would put private registries within the hands of engineers with only a few lines of code. No kidding, it was around 10 lines of Terraform to create a pull-through NPM registry.

Git's "dumb" transport is able to work with blob stores such as S3. Git will retrieve files at known relative paths to determine what refs and objects exist, and will construct the paths to those objects on the client side. However, repositories that use the dumb transport must have certain housekeeping tasks done, which aren't done by default as they aren't needed.

Git doesn't include a transport for most blob stores' native protocols, but if your blob store exposes (possibly authenticated) plain HTTP/HTTPS, Git (and thus Git-based tools) can probably talk to it.

I think it's possible to use libgit2 with an "ODB backend" that keeps everything in memory.

I’m not sure about this, but assuming you can guarantee your users only use cargo >1.7.0, you don’t need the git index any more, do you?

I’m coming from what I saw in kellnr, where they are working on removing the git-index support.

I didn’t dig into the codebase yet, but I could totally imagine that the database could be pointed to some other db than postgres, or you start using serverless postgres :slight_smile:

1 Like

Ah, thanks, I've actually missed the sparse protocol introduction. Another reason for updating!

2 Likes

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.