I was thinking about adding some protection against typosquatting on lib.rs — detect when crates have confusingly similar names, which could mislead users into installing a wrong crate, and suggest using another crate instead. This is most important from security perspective, but can also be helpful when crates have unusual spellings, or when crate names match common search keywords but aren't the best choice for that keyword (e.g. an abandoned request
crate vs creatively spelled reqwest
).
However, this problem overall turns out to be quite icky.
First, it's really hard to define what is "confusingly similar":
- By an edit distance function
ttf
andfft
are typos, but one could argue that to users interested in either of these things the difference is clear. There's also a bunch of embedded hal crates for chipsets with gibberish model names that differ only by a digit. - Is
-rs
/-rust
/cargo-
prefix/suffix too similar? Singular vs plural forms? Words swapped? - There are also crates that are intentionally named similarly, because they have a similar purpose (e.g. fasta/fastq/fastx, or variations of sdl2).
There's a trade-off between the rate of false positives, and ability to effectively detect typosquatting. With 150K+ crates it's a daunting task. The line is blurry and subjective, so even manual moderation can't guarantee that everyone will be happy.
Secondly, any action based on this has very unpleasant implications.
There has been a real typosquatting attack on crates.io by rustdecimal
crate squatting rust_decimal
, so at least a difference of _
in between words could be considered as a problem.
But there's iter_tools
crate. It's adds a small tweak over itertools
. It has basically just one user, but it seems to be maintained, and doesn't show any malicious intent. Would it be appropriate to mark this crate as having a too-similar name?
In the abstract, people quote Sturgeon's law. Many Rust users, and probably even more non-users, express a sentiment that from supply-chain-security they don't want to rely on crates from "some random person". However, turning that into any policy is easily going to insult 90% of crate authors.
An actual typosquatting attack would likely first publish a legitimately-looking crate and wait until there are enough users or a particular target tricked into using it, before turning the crate maliciuos. But this means that even legitimately-looking crates need to be flagged as a potential problem. And that can easily be interpeted as a serious accusation over a subjective rule.
So I don't see how can I do anything about typosquatting without causing a shitstorm.