Lib.rs website improvements

kornel · March 13, 2024, 1:53am

What's new on //lib.rs/

Social media image previews. Links to crates on lib.rs shared on on Mastodon, Facebook, etc. look fancier now.

The previews are dynamically generated. For compatibility, they must be raster images, so they're PNGs rendered with resvg from SVG templates. Resvg is awesome, but text layout in SVG was such a pain even for the simple 3-4 lines of text. I want to add more info there, so suggestions for data and design are welcome.
Better caching + purging of pages at the CDN. The lag between crate publication and visibility on lib.rs is down from hours to ~15 minutes (I still have work to do to refresh the index more often). Most pages are also compressed with Brotli level 11 to less than 10% of their raw HTML size, and distributed globally. Pages that are pre-cached on the CDN can load so fast the site can feel like an app running locally, and that isn't even a trick-laden serviceworker webapp, just plain JS-less HTML!
I've rewritten automatic keyword guessing. Previously it'd scrape README looking for words that could be keywords (with TF-IDF), but that used to pick unrelated words like "join us on discord" picking "discord" as the crate's keyword. Now I'm preferring keywords that appear in multiple sources: doc comments, identifiers in the code, the README, and crate and repository metadata. It's also smarter about synonyms and can pick 2-3 compound-word keywords. It's still imperfect, so please give your crates explicit keywords and categories!
Filtering of bot/mirror traffic from download numbers. I'm denoising download numbers and estimating noise floor from oldest, least used versions of crates. It lessens the impact of the recent change to how crates-io counts downloads.
Search ranking improvements. The top few crates are picked using different criteria — some are by relevance, some are by popularity. When words can have multiple meanings, I try to include all of them (e.g. search for "image" gives an image codec, but also docker image and kernel image). I've tuned handling of exact matches: you don't always want an exact match, e.g. there's an abandoned crate named error which may be older than std::error::Error itself.
It's possible to sort category pages by number of downloads or most recently published crates. Personally I don't think they're useful, but it's one of the oldest feature requests.
The /audit subpage notes which crates are available in Debian and guix. That's better than nothing, but unfortunately that alone is not a safety guarantee (as I've been informed by Debian maintainers), so supply chain security remains a tough problem.
Rendering of Markdown is closer to GitHub's rendering. There's a long tail of quirks and tweaks in GitHub's Markdown flavor (e.g. dark theme images), so it may still be imperfect. BTW, proper handling of relative URLs in readmes continues to have mindboggingly complex edge cases of symlinks + relative paths + Cargo fixups + proprietary URL schemes + repos changing layout between releases. Please use absolute URLs in Markdown whenever you can, and don't use parent dirs like readme = "../README" in Cargo.toml.
In addition to stats which versions of Rust are supported by crates, now I have data which versions of Rust people use. The data is scraped from a still-unofficial source, and is likely full of bot and CI build traffic, so take it with a big grain of salt.
I'm retiring libs.rs and crates.rs domains to avoid confusion. They show a big warning now that it's just lib.rs (lib, singular).
I had to do some work on scaling, performance, and memory usage. In the beginning I laughed how easy it is that I can just load all crates into RAM and compute all the data on the fly. That was easy with 5-10K crates. Now there's 140K of them, and I track much more data, so soon I'll have to start using a real database instead of serdeing HashMaps from disk. Also there are so many crates now that rate limiting of GitHub and crates-io APIs are often a bottleneck. At a rate of 1 req/s it takes almost two days to go through all of them, and with several calls per crate, if I cache anything for less than a week, I may exceed request quotas!
I've got a beefier machine for building crates and estimating their MSRV, so now more crates should have a useful range of versions they likely support. Also many crates specify rust-version now, which is super helpful (but remember to keep that version up to date with code changes!)
The new page isn't overwhelmed by daily auto-releasing crates. It prefers more notable updates, based on how big the semver increase was and how long ago previous version has been released.
I've rewritten processing of Cargo.toml [features], which is now a reusable crate. The maintainer dashboard how warns when you forget to use the dep: syntax in features.
See the previous list of what was added last year.
Also shoutout to crates-io team for deleting a ton of namesquatted crates. I see in my logs waves of crates appearing and disappearing, so it's not just that one guy who took a bunch of crates, but an ongoing battle to keep the registry clean.

oconnor663 · March 13, 2024, 6:42am

I had never seen the maintainer dashboard before. All my crates are a shambles. I love it.

murla · March 13, 2024, 9:17am

Thanks for this!

I've been using lib.rs for many years since discovering it, via DuckDuckGo's bang !librs. Terrific resource for reaching and exploring crates.

01mf02 · March 13, 2024, 10:06am

The maintainer dashboard is indeed very neat.

It would be cool if outdated dependencies for a package would be shown only if there are newer versions of the dependency that respect the package's MSRV.
Sometimes, when you click on the "{package} versions" link shown under an outdated dependency, it brings you to https://lib.rs/crates/{package}/versions and sometimes to https://lib.rs/crates/{package}/rev. This is the case for base64 (versions) and flume (rev) at @01mf02's Rust crates // Lib.rs. Having it point to versions always would seem more useful to me.

All in all, this is really neat. Great work, and thank you for it.

Shnatsel · March 13, 2024, 5:03pm

Have you looked into using the crates.io database dumps instead of the API when you don't need the most up-to-date data? They are updated every 24 hours, which is better than caching data for a week.

I had to add an option to use the DB dumps to cargo supply-chain because the "1 request per second" limit was really getting in the way of interactive usage.

Vorpal · March 13, 2024, 5:29pm

I'm curious as to how you finance lib.rs and what it costs to run. It is an amazing resource for all of the community, and I would happily donate a few bucks every now and then if that would help.

kornel · March 13, 2024, 6:09pm

Yes, I'm using the datadump extensively already.

kornel · March 13, 2024, 6:11pm

It costs €30/month to run, so the main investment is my time.

holly-hacker · March 14, 2024, 8:46am

I want to use this opportunity to thank you for the time you spend maintaining lib.rs. It's my main way to look up crates and I like it a lot more than crates.io. I'm sure I'm not the only person with this opinion, too.

Aankhen · March 15, 2024, 9:23am

Lib.rs is wonderful, and I want to say thank you once again for all the work you do on it. I have one question:

Is there any documentation on this? I was wondering what an absolute path to a README.md in the directory above the package root would look like. (I ask out of curiosity, not because I’ve written any crates!)

kornel · March 15, 2024, 3:27pm

I've meant absolute URLs in Markdown. Something as simple-looking as:

![my logo](assets/logo.png)

requires knowing git repository layout, commit hash, how the README was bundled into the tarball, and web host's own URL scheme they use for proxying images (which means I only support GitHub and GitLab, and the rest is probably broken, because there's no standard for mapping git URLs to their web-hosted data).

I recommend this instead:

![my logo](https://raw.githubusercontent.com/x/y/z/assets/logo.png)

And in Cargo.toml

readme = "../../../path"

makes Cargo auto-fix the path to be "./path", but that breaks the readme-relative links, because the packaged readme is in a different directory than the readme on GitHub.

Vorpal · March 15, 2024, 4:08pm

kornel:

And in Cargo.toml
readme = "../../../path"
makes Cargo auto-fix the path to be "./path", but that breaks the readme-relative links, because the packaged readme is in a different directory than the readme on GitHub.

That is unfortunate. I use this feature where I have a virtual workspace (no root crate) but I consider one of the crates to be primary (the command line program). Thus I want the user facing readme in the repo to make sense. And I reuse that readme for the crate.

What would be a good option here? I don't like the idea of duplicating the readme, and I find not having a root crate makes more sense (cleanly separates out the workspace settings from the crate settings in Cargo.toml.

kornel · March 15, 2024, 7:52pm

If you use absolute https URLs in your readme, it can be anywhere. Absolute paths (starting with /) are also salvageable.

Besides that, consider having separate readmes for individual crates. IMHO it’s more useful when searching and browsing lib.rs site.
It’s possible to include readme files as doc comments in src/lib.rs file, so if you already have crate-level docs that can be an easy way to have specialized readmes too.

Vorpal · March 15, 2024, 10:04pm

I do have that (a separate readme for the library crate), but I expect most users to be users of the command line program, not the underlying library crate. Thus I share repo root readme with the command line program crate readme.

This is optimised for people viewing the repo on github, not browsing crates.io or lib.rs. For a command line program where "it is in Rust" is not the primary feature (gasp!) I believe this makes sense

I believe that makes sense in my specific case, where the library crate is basically not a bin+lib because I want to reuse part of it in another future project and don't want bin specific dependencies (like clap and the man gen/completion build script) being pulled in when I depend on that library (this is a big weakness of bin+lib crates).

alice · March 16, 2024, 10:07am

In Tokio, we have many crates in one repo, and the root readme is the one for the main Tokio crate. We make this work by just having two copies of the file and using a ci check to ensure that both files are identical.

simonbuchan · March 18, 2024, 6:04am

Worst case you could just have the repo readme say (morally) see [the CLI readme](crates/CLI/readme.md) for usage, right?

hoijui · June 1, 2024, 9:07am

I love your site, thank you!
I would like to help promote it by adding a badge in the README of my projects. I saw it requested before, but could not find a badge promoted by you/the site, so I crafted this one. I am sure it could be done better; just see it as a conversation starter.
I used the colors from the landing page of lib.rs (Remove the labelColor to get the default badge grey instead of this lighter one). I bet Statistics is also not a good term, but I had to put something. The lib.rs mimics the crates.io badges.

8573 · July 21, 2024, 9:07pm

Could you use symlinks like openssl?

8573 · July 21, 2024, 9:37pm

If one wants to use the MSRV data for other analysis, is there a better way than scraping the HTML of the versions pages? How feasible would it be to provide the MSRV data in a more machine-readable form (either for individual crates^[1] or as ecosystem-wide dumps, whichever is easier for you)?

I don't personally have an immediate use for these data, but, considering the work required to compute them, I feel like they deserve to get others' attention and reuse.

e.g. https://lib.rs/crates/bitflags/versions.json ↩︎

riking · July 21, 2024, 9:42pm

The audit aggregation page is very nice! Definitely a much nicer way to read the audits than the manual toml trawling I was doing...

Topic		Replies	Views
Lib.rs (was Crates.rs) — what's next? community	79	8280	December 5, 2024
Lib.rs (was Crates.rs) — a new, faster crate index website announcements	60	13387	July 3, 2022
New lib.rs feature: maintainer dashboard announcements	19	2589	May 4, 2022
New features on lib.rs website announcements	20	7012	November 23, 2023
Recent lib.rs site improvements announcements	6	933	July 10, 2023

Lib.rs website improvements

What's new on //lib.rs/

Related topics