What's new on //lib.rs/
-
Social media image previews. Links to crates on lib.rs shared on on Mastodon, Facebook, etc. look fancier now.
The previews are dynamically generated. For compatibility, they must be raster images, so they're PNGs rendered with resvg from SVG templates. Resvg is awesome, but text layout in SVG was such a pain even for the simple 3-4 lines of text. I want to add more info there, so suggestions for data and design are welcome.
-
Better caching + purging of pages at the CDN. The lag between crate publication and visibility on lib.rs is down from hours to ~15 minutes (I still have work to do to refresh the index more often). Most pages are also compressed with Brotli level 11 to less than 10% of their raw HTML size, and distributed globally. Pages that are pre-cached on the CDN can load so fast the site can feel like an app running locally, and that isn't even a trick-laden serviceworker webapp, just plain JS-less HTML!
-
I've rewritten automatic keyword guessing. Previously it'd scrape README looking for words that could be keywords (with TF-IDF), but that used to pick unrelated words like "join us on discord" picking "discord" as the crate's keyword. Now I'm preferring keywords that appear in multiple sources: doc comments, identifiers in the code, the README, and crate and repository metadata. It's also smarter about synonyms and can pick 2-3 compound-word keywords. It's still imperfect, so please give your crates explicit keywords and categories!
-
Filtering of bot/mirror traffic from download numbers. I'm denoising download numbers and estimating noise floor from oldest, least used versions of crates. It lessens the impact of the recent change to how crates-io counts downloads.
-
Search ranking improvements. The top few crates are picked using different criteria — some are by relevance, some are by popularity. When words can have multiple meanings, I try to include all of them (e.g. search for "image" gives an image codec, but also docker image and kernel image). I've tuned handling of exact matches: you don't always want an exact match, e.g. there's an abandoned crate named
error
which may be older thanstd::error::Error
itself. -
It's possible to sort category pages by number of downloads or most recently published crates. Personally I don't think they're useful, but it's one of the oldest feature requests.
-
The
/audit
subpage notes which crates are available in Debian and guix. That's better than nothing, but unfortunately that alone is not a safety guarantee (as I've been informed by Debian maintainers), so supply chain security remains a tough problem. -
Rendering of Markdown is closer to GitHub's rendering. There's a long tail of quirks and tweaks in GitHub's Markdown flavor (e.g. dark theme images), so it may still be imperfect. BTW, proper handling of relative URLs in readmes continues to have mindboggingly complex edge cases of symlinks + relative paths + Cargo fixups + proprietary URL schemes + repos changing layout between releases. Please use absolute URLs in Markdown whenever you can, and don't use parent dirs like
readme = "../README"
inCargo.toml
. -
In addition to stats which versions of Rust are supported by crates, now I have data which versions of Rust people use. The data is scraped from a still-unofficial source, and is likely full of bot and CI build traffic, so take it with a big grain of salt.
-
I'm retiring
lib
s.rs
andcrates.rs
domains to avoid confusion. They show a big warning now that it's just lib.rs (lib, singular). -
I had to do some work on scaling, performance, and memory usage. In the beginning I laughed how easy it is that I can just load all crates into RAM and compute all the data on the fly. That was easy with 5-10K crates. Now there's 140K of them, and I track much more data, so soon I'll have to start using a real database instead of serdeing
HashMap
s from disk. Also there are so many crates now that rate limiting of GitHub and crates-io APIs are often a bottleneck. At a rate of 1 req/s it takes almost two days to go through all of them, and with several calls per crate, if I cache anything for less than a week, I may exceed request quotas! -
I've got a beefier machine for building crates and estimating their MSRV, so now more crates should have a useful range of versions they likely support. Also many crates specify
rust-version
now, which is super helpful (but remember to keep that version up to date with code changes!) -
The new page isn't overwhelmed by daily auto-releasing crates. It prefers more notable updates, based on how big the semver increase was and how long ago previous version has been released.
-
I've rewritten processing of
Cargo.toml [features]
, which is now a reusable crate. The maintainer dashboard how warns when you forget to use thedep:
syntax in features. -
Also shoutout to crates-io team for deleting a ton of namesquatted crates. I see in my logs waves of crates appearing and disappearing, so it's not just that one guy who took a bunch of crates, but an ongoing battle to keep the registry clean.