Lib.rs website improvements

What's new on //lib.rs/

  • Social media image previews. Links to crates on lib.rs shared on on Mastodon, Facebook, etc. look fancier now.

    The previews are dynamically generated. For compatibility, they must be raster images, so they're PNGs rendered with resvg from SVG templates. Resvg is awesome, but text layout in SVG was such a pain even for the simple 3-4 lines of text. I want to add more info there, so suggestions for data and design are welcome.

  • Better caching + purging of pages at the CDN. The lag between crate publication and visibility on lib.rs is down from hours to ~15 minutes (I still have work to do to refresh the index more often). Most pages are also compressed with Brotli level 11 to less than 10% of their raw HTML size, and distributed globally. Pages that are pre-cached on the CDN can load so fast the site can feel like an app running locally, and that isn't even a trick-laden serviceworker webapp, just plain JS-less HTML!

  • I've rewritten automatic keyword guessing. Previously it'd scrape README looking for words that could be keywords (with TF-IDF), but that used to pick unrelated words like "join us on discord" picking "discord" as the crate's keyword. Now I'm preferring keywords that appear in multiple sources: doc comments, identifiers in the code, the README, and crate and repository metadata. It's also smarter about synonyms and can pick 2-3 compound-word keywords. It's still imperfect, so please give your crates explicit keywords and categories!

  • Filtering of bot/mirror traffic from download numbers. I'm denoising download numbers and estimating noise floor from oldest, least used versions of crates. It lessens the impact of the recent change to how crates-io counts downloads.

  • Search ranking improvements. The top few crates are picked using different criteria — some are by relevance, some are by popularity. When words can have multiple meanings, I try to include all of them (e.g. search for "image" gives an image codec, but also docker image and kernel image). I've tuned handling of exact matches: you don't always want an exact match, e.g. there's an abandoned crate named error which may be older than std::error::Error itself.

  • It's possible to sort category pages by number of downloads or most recently published crates. Personally I don't think they're useful, but it's one of the oldest feature requests.

  • The /audit subpage notes which crates are available in Debian and guix. That's better than nothing, but unfortunately that alone is not a safety guarantee (as I've been informed by Debian maintainers), so supply chain security remains a tough problem.

  • Rendering of Markdown is closer to GitHub's rendering. There's a long tail of quirks and tweaks in GitHub's Markdown flavor (e.g. dark theme images), so it may still be imperfect. BTW, proper handling of relative URLs in readmes continues to have mindboggingly complex edge cases of symlinks + relative paths + Cargo fixups + proprietary URL schemes + repos changing layout between releases. Please use absolute URLs in Markdown whenever you can, and don't use parent dirs like readme = "../README" in Cargo.toml.

  • In addition to stats which versions of Rust are supported by crates, now I have data which versions of Rust people use. The data is scraped from a still-unofficial source, and is likely full of bot and CI build traffic, so take it with a big grain of salt.

  • I'm retiring libs.rs and crates.rs domains to avoid confusion. They show a big warning now that it's just lib.rs (lib, singular).

  • I had to do some work on scaling, performance, and memory usage. In the beginning I laughed how easy it is that I can just load all crates into RAM and compute all the data on the fly. That was easy with 5-10K crates. Now there's 140K of them, and I track much more data, so soon I'll have to start using a real database instead of serdeing HashMaps from disk. Also there are so many crates now that rate limiting of GitHub and crates-io APIs are often a bottleneck. At a rate of 1 req/s it takes almost two days to go through all of them, and with several calls per crate, if I cache anything for less than a week, I may exceed request quotas!

  • I've got a beefier machine for building crates and estimating their MSRV, so now more crates should have a useful range of versions they likely support. Also many crates specify rust-version now, which is super helpful (but remember to keep that version up to date with code changes!)

  • The new page isn't overwhelmed by daily auto-releasing crates. It prefers more notable updates, based on how big the semver increase was and how long ago previous version has been released.

  • I've rewritten processing of Cargo.toml [features], which is now a reusable crate. The maintainer dashboard how warns when you forget to use the dep: syntax in features.

  • See the previous list of what was added last year.

  • Also shoutout to crates-io team for deleting a ton of namesquatted crates. I see in my logs waves of crates appearing and disappearing, so it's not just that one guy who took a bunch of crates, but an ongoing battle to keep the registry clean.

73 Likes

I had never seen the maintainer dashboard before. All my crates are a shambles. I love it.

16 Likes

Thanks for this!

I've been using lib.rs for many years since discovering it, via DuckDuckGo's bang !librs. Terrific resource for reaching and exploring crates.

3 Likes

The maintainer dashboard is indeed very neat.

  • It would be cool if outdated dependencies for a package would be shown only if there are newer versions of the dependency that respect the package's MSRV.
  • Sometimes, when you click on the "{package} versions" link shown under an outdated dependency, it brings you to https://lib.rs/crates/{package}/versions and sometimes to https://lib.rs/crates/{package}/rev. This is the case for base64 (versions) and flume (rev) at @01mf02's Rust crates // Lib.rs. Having it point to versions always would seem more useful to me.

All in all, this is really neat. Great work, and thank you for it.

1 Like

Have you looked into using the crates.io database dumps instead of the API when you don't need the most up-to-date data? They are updated every 24 hours, which is better than caching data for a week.

I had to add an option to use the DB dumps to cargo supply-chain because the "1 request per second" limit was really getting in the way of interactive usage.

3 Likes

I'm curious as to how you finance lib.rs and what it costs to run. It is an amazing resource for all of the community, and I would happily donate a few bucks every now and then if that would help.

4 Likes

Yes, I'm using the datadump extensively already.

It costs €30/month to run, so the main investment is my time.

25 Likes

I want to use this opportunity to thank you for the time you spend maintaining lib.rs. It's my main way to look up crates and I like it a lot more than crates.io. I'm sure I'm not the only person with this opinion, too.

23 Likes

Lib.rs is wonderful, and I want to say thank you once again for all the work you do on it. I have one question:

Is there any documentation on this? I was wondering what an absolute path to a README.md in the directory above the package root would look like. (I ask out of curiosity, not because I’ve written any crates!)

1 Like

I've meant absolute URLs in Markdown. Something as simple-looking as:

![my logo](assets/logo.png)

requires knowing git repository layout, commit hash, how the README was bundled into the tarball, and web host's own URL scheme they use for proxying images (which means I only support GitHub and GitLab, and the rest is probably broken, because there's no standard for mapping git URLs to their web-hosted data).

I recommend this instead:

![my logo](https://raw.githubusercontent.com/x/y/z/assets/logo.png)

And in Cargo.toml

readme = "../../../path"

makes Cargo auto-fix the path to be "./path", but that breaks the readme-relative links, because the packaged readme is in a different directory than the readme on GitHub.

2 Likes

That is unfortunate. I use this feature where I have a virtual workspace (no root crate) but I consider one of the crates to be primary (the command line program). Thus I want the user facing readme in the repo to make sense. And I reuse that readme for the crate.

What would be a good option here? I don't like the idea of duplicating the readme, and I find not having a root crate makes more sense (cleanly separates out the workspace settings from the crate settings in Cargo.toml.

1 Like

If you use absolute https URLs in your readme, it can be anywhere. Absolute paths (starting with /) are also salvageable.

Besides that, consider having separate readmes for individual crates. IMHO it’s more useful when searching and browsing lib.rs site.
It’s possible to include readme files as doc comments in src/lib.rs file, so if you already have crate-level docs that can be an easy way to have specialized readmes too.

1 Like

I do have that (a separate readme for the library crate), but I expect most users to be users of the command line program, not the underlying library crate. Thus I share repo root readme with the command line program crate readme.

This is optimised for people viewing the repo on github, not browsing crates.io or lib.rs. For a command line program where "it is in Rust" is not the primary feature (gasp!) I believe this makes sense

I believe that makes sense in my specific case, where the library crate is basically not a bin+lib because I want to reuse part of it in another future project and don't want bin specific dependencies (like clap and the man gen/completion build script) being pulled in when I depend on that library (this is a big weakness of bin+lib crates).

1 Like

In Tokio, we have many crates in one repo, and the root readme is the one for the main Tokio crate. We make this work by just having two copies of the file and using a ci check to ensure that both files are identical.

1 Like

Worst case you could just have the repo readme say (morally) see [the CLI readme](crates/CLI/readme.md) for usage, right?