Why does Cargo download the entire registry?


#1

I’m just idly curious here. Why does Cargo download the entire registry just to fulfill some dependency downloads? I’ve noticed Linux does this as well (apt-get update does at any rate) and always wondered about that too. Why do I need to have a copy of the entire registry just to download a few dependencies? I hate running apt-get update on my (admittedly poor) internet connection, and Cargo’s registry updates turn a simple cargo build into a big ordeal every now and then.


#2

I think the main reason is simplicity. You may have noticed that the registry is literally just a github repository; if you wanted to be more selective, you’d need to make registry access go through the crates.io API server. It definitely can be done (cpanminus in Perl-land uses a dedicated index service); I’m not in position to know whether something like that is planned or whether a PR would be accepted.

As for apt-get, both Debian and RedHat predate the modern cloud ecosystem and their package managers are designed to operate with all packages on a dumb FTP server, which is why they need to fetch the index and do search operations locally. Convincing fifty mirror operators to set up ftp and rsync was a lot easier in the 90s than getting them to run custom search software, IIUC.


#3

Yeah, I figured it was just historical for Debian. Simplicity is always good, I’ll accept that answer :slight_smile:


#4

I’ll be the guy who says: don’t use apt-get, use apt! It’s better and easier to type.


#5

Any plans for this changing in the future?


#6

The thing cargo syncs is just a git repo containing an index, so the initial fetch might take some time, but subsequent updates are incremental and should be fast. It’s not actually the ‘entire registry’ (which is huge), but an index of the registry. Cargo uses git for this because it’s simple and obviously correct.

The index contains the dependency metadata to construct the entire crate dag without downloading the crates themselves. It’s not obvious how to achieve the same thing by only downloading part of the registry index.

The cargo source actually contains a lengthy rationale.

One thing cargo could do is a shallow clone. That would probably eliminate all reason to complain about the performance of syncing the index. Cargo doesn’t do that because libgit2 doesn’t support it.


#7

Another reason cargo doesn’t do that is because github ops staff doesn’t like it.


#8

Just throwing in the issue of libgit2 regarding the shallow clones.