Is this slow update of the crates.io index meant to happen?

image

With the downloading speed of over 10+ MiB/s, this is taking ages.

4 minutes just to crawl above the mark of 33%.

Why would it take so long?

And what's the point of re-scanning the whole index of crates.io, when I am merely adding one single dependency to my project? Pardon my negligence if it is me who has missed a trivial requirement here.

It's not; Git is using incremental deltas. The changes are just that big.

Starting with a relatively recent (1.7x) version of Cargo, this has been improved and sped up a lot; upgrade your toolchain to the latest version.

3 Likes

Just for reference, the sparse index protocol for registries was stabilised in 1.68.0 and Cargo uses the sparse protocol for crates.io as default protocol since 1.70.0.

2 Likes

I remember a few years ago leaving Cargo updating the index for over half an hour on a new install. (I say "over half an hour" because I honestly don't know how long it took; I left the room and did something else for a while.)

I would have been ecstatic if it only took 12 minutes.

jofas mentioned that this changed recently, so this might not be entirely true any more, but... Cargo previously had to pull down the entire index to do anything involving dependencies. You might be "merely adding one single dependency", but it needs the latest index to resolve that dependency.

As for why it's grabbing the whole thing, that's because it is/was hosted as a regular Git repository, for what I believe was a combination of simplicity and cost (make GitHub deal with hosting it).

That was a part of my question. If my eyes didn't play a trick on me, before it started going through the delta resolutions, there was a "downloading the index" as a whole, entry. Although I'm certainly not an expert on the dependency management, what makes it impossible to implement a HashMap-like indexing, with this last index to rely on the dependencies and their respective versions themselves?

Instead of parsing and delta-ing the whole index, one would then be able to request only the data relative to the package in question, which would then recursively go through its own dependencies and only fetch what is needed for them.


Should have read that previously, I suppose. That's the thing already in place, then.

A few bits that sparked some more questions:

How can the current resolver be adapted to enable parallel fetching of index files? It currently requires that each index file is available synchronously, which precludes parallelism.

This bit seems rather arbitrarily restrictive, if I understand it well. Every next dependency will only be resolved after the previous one. Why block / await for that to happen?

crates-io plans to add a cryptographic signatures to the index as an extra layer of protection on top of HTTPS. Cryptographic verification of a git index is straightforward, but signing of a sparse HTTP index may be challenging.

Is it not as simple as signing the sub-index itself? For the https://index.example.com/se/rd/serde example, that would produce the signature for the /se and for the /rd bits, separately. A package would be successfully verified only if all the sub-parts agree with each other. What's the main difficulty here?

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.