This is a very different topic than the ones I'm used to post here, but I have an upcoming international trip that will take around 8h and I would love to use some of this time to work on some of my Rust projects.
Since I will have no Internet access during this time, I was reflecting the things I rely the most on my development cycle and, of course, I do a lot of googling (which won't be possible) but I also rely a lot on crates.io and docs.rs.
I am looking to find if someone already attempted something similar and what the struggles were and availability of offline resources for download in preparation (like maybe the X most used crates for crates.io and some downloadable form of docs.rs archive).
Any tips or ideas on how to make something like this possible?
If you already have a project and downloaded its dependencies you can just do cargo doc --open to build and read your docs locally as you would on docs.rs.
One way is to likewise have any dependencies already downloaded, e.g. with cargo fetch. While on the plane you can use cargo --offline. If there are other crates you anticipate using, you could make a dummy project relying on them and cargo fetch that.
That's very, very interesting. I had no idea that could be done.
It would be lovely to know how much space it takes. I've been thinking about maybe having an external 1TB/2TB SSD just for that archive. Gonna start some googling to see if I find some more useful context.
I also found that you can download an offline version of Stack Overflow to use with Kiwix. The whole archive is another 80Gb, but still considering if it's a good idea.
I was interested in your researches here having been a liberal user of offline docs and fetches because of unreliable bush internet. This was made even worse by floods that have washed away much local infrastructure twice this year. I've had several internet outages exceeding a week.
132GB is a lot of data though - it would probably take a couple of days to download, and use up more than half of my 200GB monthly cap! It'd be interesting to know how much the repos change. If panamax permits or makes it easy, would you be so kind as to report back sometime on how much extra data it takes to sync after a suitable interval?
This is a very short period for sure, but I just ran it now to see what would happen.
For context, nightly rust was updated, which took most of the 5m33s the whole process took.
After that, 23,175 items from crates were either updated or added. The initial execution had ~560,000 crate items downloaded, so we had around 5% increase/change in 24h give or take?
Disk usage went from 123GB to 143GB, a whopping 20GB delta.
It tends to confirm what I suspected, which is that my existing targeted approach (anticipating outages as best I can, and selectively downloading as appropriate) is probably more feasible for me than downloading the entire rust corner of the internet ;). At least until I can afford Starlink (unlikely), or floods and fires drive me into an urban area (which is close to an 'over my dead body' kind of deal).
Back in September, I spent an entire week camping and fishing along the coast and if you wanted phone reception you'd need to hop in a 4WD and drive 10 minutes out to the cliff.
The only real preparation I did was to download a couple books for reference material, make sure any necessary dependencies were already downloaded (the usual suspects - serde, anyhow, rowan, salsa, clap, etc.), then set net.offline = true in my .cargo/config.toml file so builds wouldn't accidentally try to touch the network. This worked out pretty well because a compiler/VM is typically quite self-contained and I already put an emphasis on a low dependency count to keep my crate's build times snappy.
I do not know if any of the off-the-shelf registry mirrors offer it, but a simple way to reduce storage usage of a mirror is to filter what crates you actually provide.
Picking the set of crates to vendor if (it's not just "the set of crates you're already using") is an art, but a reasonable cut would be to only mirror the most recent version in any semver-compatible range.
(e.g. you'd mirror serde 1.0.147 and serde 0.9.15, but not other 1.m.p or 0.9.p versions.)
A slight refinement of that scheme is to now query over your trimmed down registry mirror for any non- caret[1] dependency version and also vendor the top of that range. This should ensure that cargo update/cargo add will always choose versions that you've mirrored the source for.
You could also get all of the lock files and provide those versions such that --locked succeeds, but I suspect that this would result in mirroring enough of the full collection of crates that it wouldn't be worth over just mirroring the full repository.
From the other end, you can of course come up with some "significance" heuristic and only vendor crates which meet that heuristic. The most primitive version of which is just whatever's already in your Cargo.lock.
The ^ver syntax is the "semver-compatible" operator, and the default operator if no operator is used. ↩︎
I wrote zerus for this use case, although it uses the rust http feature. The feature here is you don't need to download all the crates, just the ones you need.