Squash crates.io git history regularly

I like the design of crates.io and how the underlying storage is just a git repo. It's really awesome.

However the moment you're on slightly flaky internet it becomes quite painful to update. I've just spent 10 minutes waiting for cargo update to pull a few months worth of updates. To be clear, I'm talking about the Updating crates.io index time and not the time it takes to download updated crates. I think this disproportionately impacts people in countries with typically worse internet than the US/EU, and regular contributors/users of rust are unlikely to experience this.

Is it possible to either:

  1. Regularly squash the crates.io git history, so it's a single commit we need to pull
  2. Change cargo to git clone --depth=1 without pulling the intermediate commits?

I'm pretty sure this is done already - for example this is the most recent squash.

I think this is blocked on support for shallow clones in libgit2 itself.

Homebrew recently moved away from shallow clones at GitHub’s request. I believe this is because it puts more load on the server.

3 Likes

There is accepted RFC#2789 to additionally serve the crate index in a sparse format over HTTP directly as static files, so that (by default) you will only need to pull info for the crates that you're using. The git index will still be the source of truth, maintained, and available, but the sparse format will become the default used format some time in the future.

(The git download, even with regular squashing, won't scale to NPM scale. The sparse API will.)

Tracking issue.
Initial implementation.

1 Like

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.