Many READMEs of crates contain relative URLs such as [the LICENSE file](./LICENSE)
or embedded images 
.
The problem is that these URLs are meant to be relative to the repository, but when they're published in a crate, they lose their original context. Crates are published as tar.gz files, and they don't have URLs for individual files. Some crates use paths to parent directories (../LICENSE
) which don't even refer to files in the crate tarball. READMEs in published crates are technically full of broken links.
I assume most crate authors expect these URLs to be fixed by resolving them the same URLs as GitHub (and other hosts) do. This is surprisingly hard to do correctly:
-
Sites like GitHub and GitLab rewrite URLs in Markdown in non-trivial ways, because they have their own URL scheme for linked documents, and another scheme for a CDN that serves images. Resolving links for GitHub/GitLab-dependent READMEs requires reverse-engineering what these sites do.
-
URLs need to contain git commit hash (not all crates have this info, and it's not guaranteed to be correct) or name of the main branch. It's possible to rename the main branch, and finding the correct name requires
API callsquerying the repo. -
Cargo.toml
only contains a URL to a repository, but not the path where the crate is in the repository (if it's a monorepo). Correct support for repo-relative URLs requires cloning the repository, scanning its directory structure, parsing allCargo.toml
files in it to find where the crate was in the repo. Mapping of the path in the repo to a URL is again not a simple repo-relative URL, but a custom directory scheme that varies between code hosts. -
There are plenty of edge cases with
Cargo.toml
containingreadme = "../README"
, symlinks, relative paths in the README (../../assets/logo.png
), and absolute paths/logo.png
that can't be simply interpreted as abs path per URL spec.
crates-io has some GitHub-specific rewriting code that works in most common cases, but everyone with a non-default configuration, or a different code host, is out of luck. Sadly, there's no standard for mapping of in-repository paths to their public web URLs, so crate READMEs require all these fiddly fixups, and every code host does it slightly differently.