Specifying multiple git mirrors for Cargo git dependencies

Is it possible to just, ditch crates.io entirely? Ideally by putting a git commit in the Cargo.toml and pointing at a list of mirrors somewhere?

Yes, it's possible to specify Git repos directly as sources for individual packages, as well as pulling them from alternative registries.

You can also use source replacement to download all registry packages from a mirror of that registry.

This is often used together with the cargo vendor command, which can create a local mirror of a group of packages from a registry.

1 Like

No like, mirrors of the git repos.

Git is decentralized. Anyone can clone a repo and host a mirror, and the crypto stuff makes sure the mirror is "safe". It's like blockchain, really, except much cheaper to run.

Did you read the first link provided by @H2CO3? Nothing restricts you from using a mirror. Just provide a location to the repository.

2 Likes

However, Cargo does not have a way to specify multiple git sources for a single dependency. Each git dependency has just a single git URL.

Do you mean you'd want to make crates.io decentralized?

Yeah, that's basically what we'd need to be able to make this reliable enough tbh.

Why not just point at a single proxy URL running squid or something, and set up the proxy to load-balance across a set of URLs?

2 Likes

Because it's not actually load-balancing. The point is to have untrusted third-parties run git mirrors, potentially with their own commits, and rely on commit hashes being strong enough.

I have to say I don't know what you are trying to accomplish.

You say each private repo would have its own commits, yet be safe because of "crypto stuff". I honestly don't understand what you are saying.

4 Likes

git is basically a blockchain, so anyone can hold a copy of the blockchain and you can easily verify it. as for the consensus algorithm it's basically "someone tells you what branch of the blockchain you should follow".

so you'd declare a sha1 dependency (... okay git really needs to get rid of sha1) and setup a list of forks of the project. then if someone from your list wants to pwn your users they need to find a hash collision.

I see what you are getting at now. No, Cargo currently doesn't work that way. You'd have to create your own workflow or modify Cargo.

That's... Not actually what Git is. Git is nothing like the blockchain. There are no consensus algorithms to speak of. No verification of the integrity of the acquired data is performed. There is no permanency to a git repository; anyone can delete commits via ways like squashes, advanced git commands, or by directly modifying the .git directory manually if they know the format well enough. There's no way to stop that from happening.
On the other hand, the Blockchain is permanent; data you store on there can never be removed or altered in any way. Computers synchronise with the blockchain but all the computers must have the same data or consensus fails. (Git allows data to be trivially out of sync; I can fork a git repository and it can be out of sync with the upstream, which is not possible on the Blockchain.)
Though Git and the blockchain have some similarities, neither are necessarily alike. Its the reason Git uses SHA-1; its used in commit hashes, and there's no need for a stronger hash because (1) no one is going to bother trying to create a hash collision for a git repository commit because they'd gain nothing out of it and (2) the likelihood of two commits having the same hash is so low as to be non-existent. (There are more than 190 million repositories on git hub and I doubt a single one of them has ever had a commit hash that matches any other commit hash in another repository in all the time GitHub has existed.)
As for the actual question, yes, you can ditch crates.io: create your own registry.

9 Likes

Except in a decentralized git-based system there would be something to gain from creating collisions. The whole point of our project, GAnarchy, is to provide a decentralized alternative to github where nobody "owns" the repo, you just have some forks.

So yes, SHAttered affects git. And git is a merkle tree, same as a blockchain. And the consensus "algorithm" is that in a perfect world hashes (and thus merkle trees) are unforgeable, so you can just pick a commit and call it the official one. (Basically a single-party proof-of-stake.) It doesn't matter that you can have multiple blockchains in the same data space (repo) because any single commit hash represents a whole blockchain.

I've never experimented enough to know, but does pulling a git commit by hash truly validate all the blobs data against their claimed hashes? Or does it simply assume the blob filenames are what it expects? If you wanted to use it as a security mechanism (which I wouldn't advise because of sha-1, among other reasons), then this is critically important.

Either way, Cargo doesn't work like this, meaning you'll have to find another mechanism to achieve it.

Yes, altho we're not sure if it's the default. There have been hijacks where malware was introduced into a repo and nobody noticed because they didn't have it enabled tho, so we'd hope it to be default nowadays.

I prefer pulling dependencies from a single archive file instead of a git commit and comparing that against a known hash. Even with distributed repos, you need a central agreement on which hashes are valid or people will just use whatever is easiest. I wonder if it's just as easy at that point to have people host matching archives.

You'd put the "valid" (aka known/desired) hashes in Cargo.toml, ofc.

And yeah, you could have ppl host matching archives. But then they can't add their own changes to them. Which kinda defeats the point of decentralizing development.

Sure, I get that. But you'd all have to agree at some point on which hash is "the official one", right? I'm not sure that's a whole lot more efficient than just hosting an official archive and hash beside the git repo, although it might save a few steps.

Anything official should really be digitally signed and not just rely on a hash.

Even Debian's apt-get has had vulnerabilities with this in past, and theoretically everything it downloads is digitally signed.

1 Like