An idea to resolve typosquatting

So I was thinking about this the other day. Why not adopt the same as other services which have duplicate names and have a # delimiter and a 4 digit code like Discord or Each crate with the same name would have a different 4 digit code but there would also be a canonical name which doesn't require the 4 digit code.

For example, the random crate might get assigned random#4738 as its unique name. But would keep a link to random#4738 as the canonical link to random so when we use cargo add random it would automatically reference random#4738. These canonical links could be created on an as needed basis as a project becomes completed while freeing up canonical cargo names that were used for projects never completed or fallen by the wayside. It would also mean that you can't typosquat something like rnadom because canonical names would be created manually by representatives of the community. Until then, you'd only be able to reference a crate in cargo or cargo.toml using the full unique name.

The fundamental problem with any anti-name-squatting proposal, that doesn't do away with global human-readable names entirely, is finding and supporting the people who will be moderators — the people who do the “create manually by representatives of the community” step in your proposal, or in other proposals might be deciding whether to remove packages. It's a big job, and often thankless and contentious.


I feel this was an mishap in the design of This could/should be resolved with a simple solution: namespaces. Each user gets a namespace, and after this the package names dont matter.


This rehashes a lot of old conversations, but I don't believe namespacing actually solves the typosquatting problem. It only addresses the squatting half of the problem.

Let's say we're in the future and has namespaces and everything is unicorns and rainbows. Some bad actor decides to create the namespace rnadom, because why not, no one is using it. And some unfortunate soul adds rnadom/random = "1" to their dependencies. Whoops.

Rephrasing @kpreid's stance, this is not really a technical problem with a technical solution. It's a social problem that needs social solutions. Namely someone funding the hires necessary to clean up all the cruft.


I sonewhat disagree with this general claim. Technical solutions can change which social problem you have, and some are easier than others.


While this ship has quite likely sailed for Rust, if all code was content-adressable it would solve a lot of problems. By content-addressable I mean something like you write some code, that code (+ maybe some lightweight metadata) is hashed and solely determines the value of the hash. The code and its hash are then uploaded to something close enough conceptually to
When reusing code, you then link to it by using its hash rather than some name/version combo.

These are some of the problems I see it solving given proper integration on language and ecosystem levels (again, I'm not holding my breath on that one):

  • Completely gets rid of typosquatting and lack-of-namespacing issues, because there are no names to imitate or clash with, respectively
  • Not necessarily needing to live in a full-blown crate for distribution purposes. Having this ability could completely obviate the need for things like gists by virtue of having them directly addressable and thus trivially reusable zand therefore actually more convenient than gists
  • Possibly could provide a more flexible solution for the coherence problem currently solved by Rust's orphan rule by letting Rustaceans declare which content-adressable impl they want to use for a (type, trait) pair in a given context. No need for specialization, just directly specify which impl you want.

One potential tradeoff is that it means actually addressing the content by hash (which could be aliased, but that brings back typosquatting and lack of namespacing problems) which is based on the contents rather than by human-readable label. I'm not sure how NixOS solved this last part, since packages are generally named, so a human-friendly solution to that problem seems possible.


The solution basically boils down to PKI. Not surprisingly, already has all of this; hashes, TLS, human-readable friendly crate names. Perhaps your proposal is to more generally expose the hashes (the ones you see in Cargo.lock) to the language. But I don't know, it still feels like doing this just kicks the can.

I'm not sure about the language proper, but I'd like to see support for it in Cargo.toml dependency entries for sure. If that part is fixed, it wouldn't surprise me if most of the rest fell into place as a direct consequence.

I encourage you to have a look at, and play around with, a NixOS install in a VM. The idea I proposed is basically a ripoff of the Nix package manager.
I've started using NixOS as daily driver, and dependency hell and typo squatting just aren't real world concerns there.
And the thing is, the more I use it, the more I see it working in scenarios I previously never thought of. Flatpak/snaps? Almost a nonsensical proposition when you have Nix.

So the theme there is that I can see (an evolution of) the PhD thesis that spawned Nix and NixOS also spawning a solution for package management at the PL level, because of a common set of fundamental problems being solved. The catch though is that our current model is insufficient and needs updating, which most folk in general don't seem to be willing to do.

Maybe we should re-think the way of importing external code.

Today it's simply "I want to use this, I copy-paste the name without even having to check the crate is legitimate".

Using only hashes it would not be very convenient to read a Cargo.toml, and hashes don't solve the problem of typosquatting (or other attacks) in search engines. So it may be nice to create a lot of curated directories, each actively maintained by few people, who would ensure they use consistent naming. A bit like the official package repository of a Linux distribution.

This way you can use names instead of hashes, without the problems of names. You just need to use the crate directory maintained by your company or other organization you trust. (these directories would just map names to hashes) Crates that are not in your directories would still be accessible via hash.

It consumes more time maintaining the directory, but saves time when ultimately choosing a crate.

I'm aware all this is already possible using Cargo, but highly discouraged by UX, and impractical if you want your crate to be usable by anyone else.

Using hashes would allow cargo to request other servers than the official ones. There could be a list of mirrors, and even a DHT. (but these databases would not use names, only hashes)

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.