They're from clippy v1.80-v1.83 with almost everything enabled, including pedantic, suspicious, and nursery lints, plus some rustc lints. It includes rustc lints and warnings. I've allowed crates to silence their warnings, and the extra lints were enabled only for crates that did not have a [lints] table in Cargo.toml.
The data is broken down by crate, so you can check what clippy thinks about each crate. I've kept only one message per error code per crate. Some codes have an extra text added after a space — these are added by me, extracted from the messages in an attempt to make the codes more specific, e.g. the deprecated code has the deprecated feature name added to it, like deprecated try.
This is very helpful. I thought I was being an exemplary crate maintainer by running clippy before publishing crates and telling it to treat warnings as errors (and refusing to publish unless it passes without any errors). But apparently -W clippy::all is needed to achieve maximum chest hairs.
I'm guessing unused_imports comes from people writing code with all features enabled while this (I'm assuming) runs with default features. unused_qualifications I don't understand. The rest are all pretty normal.
It just doesn't warn by default, right? So it's super easy just never to notice. It's easy to add a use for an item in a file after you already have some qualified usages in the same file, right? Similarly, a refactor can move code to a place where something was already imported via use.
This is an annoying one to discover that I'm guilty of. Petition to make cargo enable this lint by default. (Not really, I'm sure there's a reason for it, but I feel a little stupid for publishing crates that trigger this).
It might be interesting to split the data between crates that appear to be trying to work with clippy (using clippy:: lint names anywhere, or having configured {"rust-analyzer.check.command": "clippy"}, or for that matter having the string cargo clippy anywhere), and those that don't.
We might expect the former to be trying to have zero clippy warnings, and failing due to maintainer error or due to new lints being introduced, whereas the latter will have many more warnings[1] and a different distribution, so the differences might be interesting.
especially because clippy::pedantic has quite a few lints that have the character “do this thing this arbitrary way rather than that one” (e.g. explicit_iter_loop) ↩︎
Have I done something wrong, or are crates missing in the crates table? I was trying to partition the messages based on crates with a version that reached 1.x or not, then I realized there was a big mismatch between my join and clippy_results alone. For example, the first crates with code = unused_imports have id = 109, 110, 214, which are all missing from the crates table.
I suppose that including all the crates would have made the file too big, though; maybe that's the reason.
On a side note, I have a crate with "missing_errors_doc" because the doc is only in the trait methods' declaration but not in all their implementations (the methods operate on integers, and the implementations are made for each integer type in a declarative macro, which I don't like but is unavoidable). I don't know if that's considered a problem.
Ah, that's a bug. I've used INSERT OR REPLACE to add crates, and that increments ID leaving previous results orphaned if I've checked the same crate twice, but the latest results are okay. You can just delete the orphaned results.
The Rust project regularly uses Crater/Rustwide to run checks across ~all publicly reachable Rust code. According to the docs, that runs on an AWS c5.2xlarge machine with 2Tb storage[1], and the cargobomb machine used for beta regression test runs has 30GB RAM.
I don't recall where to check how long a Crater run generally takes, but the answer is a good long while.
If you do happen to do this, make sure to do so on a sandboxed machine. Even if you only run clippy, you're running untrusted code during the build process.
I'm certain they mean 2TB, since 2Tb is only 250GB. (Byte vs bit) ↩︎
It took me 4 months, but I wasn't running it full time, only as ad-hoc background job. Assuming 30s per crate, that's 50 CPU-days to try them all.
It doesn't need anything special. Building Rust on a server is the same as building on your own machine locally. I've been using Hetzner's ARM machine, giving builds 8 cores, 16GB RAM, and ~300GB disk space. Disk space needs to be monitored and purged regularly. Rust/Cargo will eat all the disk space you give it.
I use some tricks that help a lot:
I'm using a git index and unstable no-index-update for instant dependency resolution.
Disabling debug and incremental builds (they massively balloon disk space and I/O)
Grouping crates by similarities of their dependencies, and checking them in batches as workspace members (of about 20). Sometimes dependencies conflict and batches need to be reshuffled.
Creating a giant Cargo.lock with everything that has built successfully so far. This helps Cargo skip a lot of deps as Fresh, and avoids big rebuilds due to tiny version variations.
With the last two I don't have to recompile syn a million times.
Thanks for these tips. It's really interesting and helpful to me, because I happen to write os-checker which is a dedicated tool[1] to run checkers like clippy on a bunch of OS-related Rust codebases[2].
I met a storage problem a few days ago that the disk limit Github Action was hit. And I had to run the checks in batch. But my strategy is naive to run them in fixed size chunk.
However the biggest problem for os-checker is to learn what target triples a codebase should be run on. It means a checker on a repo will emit multiple results based on compilation condition/flags. With wrong compilation condition, we might see thousands of errors due to lack of stdlib and get large JSON outputs, like this.