Alternate crates-io index/repo?

My ultimate goal is this:

I need a private (in house) - offline crates.io mirror.

This happens to be one of the most UNDOCUMENTED and VERY FUSTRATING thing about rust and the rust ecosystem and everything I can find via GOOGLE it is BEYOND fragile and brittle and does not work fully and completely.

My requirement is quite simple (in my opinion)
a) Read a text file that lists the crates (and versions) that are approved
b) If the crate is not listed in the "approved list" - do not mirror this.
c) Run some script 'outside' connected to the internet.
d) The result should be a TAR file of some sort.
move/copy that TAR file to the offline environment.
e) Unpack the TAR FILE
Serve files from that TAR FILE.

f) Also support locally created/maintained crates in some form via a GITLAB type server.
This seems easier - we do not want to publish to the mirror server.
Why? We would want to go through the "gitlab code review process" for our internal crates.

Here a recent post of mine on the matter of self-hosted registries, containing a couple of links you might find useful:

It looks like OP is definitely aware of the first one you’ve linked, judging by


Edit: Judging by

it sounds like doing the thing OP wants might be as simple as – some way or another – coming up with a Cargo.lock file that contains all the crate versions you need. Of course I could be missing something, I have never tested any of these tools myself so far, but if we count the lockfile as ā€œa text file that lists the crates (and versions) that are approvedā€, and add a manual step of turning the directory into a TAR, one should hopefully to follow all of a) through e)

2 Likes

Nicely spotted. After you so diligently checked Panamax issues, I read through some myself and discovered a link to zerus in #106, which—according to its README—can build a crates.io mirror with solely the dependencies from a Cargo.toml file.

Yea that Panamax does not work for me. I wish it would I really wish it would

Reasons:

  1. I cannot use a caching server Panamax is a caching server

  2. I can only mirror the specific list of crates and no other crates Panamax has no approved white list of crates

  3. when serving crates it has zero access to the internet.

4 think two separate computers one has internet the other does not zero (airgap) you sneaker net a cd with files between the computers

Thus I must a) outside fetch all crates b) create a cd image to bring into the closed environ ment c) then run the server

I'm here because $JOB {defense industry} requires the use of a Memory Safe Language in certain projects. RUST is that language of choice

The problem with lots of these types of suggestions and Ideas and things I have read (found via GOOGLE) is this:

a) It is a partial solution
b) Rust and things have changed and I have to do things with other syntaxes
c) One thing i read, suggested using [registry] but it is no longer supported

what I would really like is this command sequence with these invented command line options:

Step 1: Run this "invented command"

TOOLNAME  --mirror-crates-only  \
                      --whitelist  LIST_OF_CRATES.txt   \
                      --mirror_dir_name   SOME_DIRNAME

WHERE:

  • list_of_crates.txt - would be a text file with lots of entries in the form: name=version

  • SOME_DIRNAME is the output directory - it is opaque to me - the tool deals with this.

step 2: Run this command:

```TOOLNAME  --mirror-tools-only --version 1.88.0 --version --1.89.0 \
                  --TARGET <name> --TARGET <name> 
                  --mirror_dir_name SOME_DIRNAME

WHERE:

  • I need to be able to specify multiple targets for the "rustup" part
  • Examples include: ARM64, RiscV32bit, CortexM series, X86 Linux - etc.
  • I need to be able to specify the version of RUST to mirror

step 3: Is on me - I transport "SOME_DIRNAME" into my closed environment.

Giving the IT group a TARGZ does not help. Why? Because everything that is in that directory must be hand inspected {Security requirement} and viewed by a human eyeball.

====

The remaining steps are in /on an "AIR GAPPED" network. No internet access.
The ONLY way things are put on this network is via "sneaker.net" + CDROM images.

Bringing source code in is an easy process, bringing a pre-built binary in raises way to many questions. ie: I'm doing embedded, so bring the source of "bindgen" into the room and build it is an easy task for me, - bringing a prebuilt binary oh hell no. That is not happening.

What is also very problematic is: "redhat" - installs "rustc" in very different place (a very non-rust-type location, ie: /usr/bin/rustc) and distributes other rust things in other random places.. GRR!!!

Step 4: In that closed environment, I can "ping" any SERVERNAME.local

Meaning: Think "old school" lan-party in a conference room with no internet connection but all machines can talk to each other over the local lan. "DNS" works for the "SERVERNAME.local" domain.

Other key things: On this issolated network, I would expect HTTP(not S) - if HTTPS is required, expect a self-signed certificate , and we would use GIT + SSH-KEYS for all GIT access so we can script/automate things.

Step 5: On that CARGO_SERVERNAME.local machine I run:

TOOLNAME --server --datadirectory SOME_DIRNAME

That server should be able to serve out all the files that where in that SOME_DIRNAME

Step 6: A simple means to explain to my users (10 to 20) - how to configure 'cargo' related so that it will connect to ONLY "SERVERNAME.local" - and nothing else

NOTE: We do not need to publish to that SERVERNAME.local - instead, we would "git clone/push/commit" - to some GITLAB (or GERRIT) server on that ".local" domain. Why? We would need to perform our own set of code reviews - and GITLAB/GERRIT etc - would provide that solution rather simply and within our current "approved development process".

step: 4/5/6 - could be emulated on linux with the tool: "unshare -n" which will shut down the network interfaces, see: unshare(1) - Linux manual page for more details.

My test environment is on my laptop using a VMWARE linux image - when I need to drop the the network to emulate the closed air-gapped room, I click and disable the VM network or re-enable it as needed.

But yea thats what I am dealing with.

And it is not a simple process.

Panamax has MANY of the features I need but it is missing the last little bit that makes it possible.

Yea, and I am facing the "i am very new to rust" problems.

What does ā€œcaching serverā€ mean to you?

It can, I just explained how, you can provide a Cargo.lock file, which is not exactly the format you’ve asked for, but pretty close, and from what I can tell (by now, I even went ahead and I roughly tested it for you once) it does actually accept a somewhat minimized / fake Cargo.lock following the pattern of

[[package]]
name = "addr2line"
version = "0.24.2"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = ""

[[package]]
name = "adler2"
version = "2.0.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = ""

[[package]]
name = "autocfg"
version = "1.5.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = ""

 ā‹®
 ā‹®
 ā‹®

where there’s basically nothing left besides the crate name and version, anyways!

If you wanted, you could probably re-write panamax itself trivially to get rid of the need for the source and checksum fields in the first place with a few minutes of time; or see how to customize it even further in case a toml format isn’t to your liking at all, to read the same kind of information form a different file format instead.

From what I can tell from the docs, isn’t that the whole point of panamax? Are you saying you can’t get it to run without internet access? Or is your problem with installing panamax itself without internet access?

What does ā€œcaching serverā€ mean to you?

A caching server to me would work as follows:

  • Cargo requests a crate or needs a crate for a build.
    First cargo looks locally (ie ${CARGO_HOME} - and perhaps does not find the requested crate. normally, cargo would connect directly to 'crates.io' and fetch the required files.

  • In this case, Cargo reads the '${CARGO_HOME}/config.toml' file and is redirected to the PANAMAX server

  • Panamax acts Sort of like the old HTTP tranparent SQUID caching server, or a HTTP caching proxy of some type.

  • If PANAMAX has the files - PANAMAX provides them - no connection outside is required. That would be great.

  • If PANAMAX does not have the files - PANAMAX contacts "crates.io" - and fetches the files
    In my case the server where PANAMAX runs has no outside connection. How is that error message reasonably reported back to me? To fix this, I need to (A) download the required files and (B) transport them onto the server and (C) place them in the correct location - I can probably handle (A) and (B) _ ie: Download a tar file with the required stuff. But how do I accomplish step (C)?

  • In the caching case - PANAMAX also saves a copy incase there is a second (future request) - and there will be probably every hour or so - because of CI/CD processes that run automatic builds and each automatic build starts fresh with a clean/empty checkout directory.

It can , I just explained how, you can provide a Cargo.lock file,

I did not understand what you mean - I understand it better - I just have to create my own "cargo.lock" file - it was my understanding that "cargo.lock" is created by Cargo and it is not something I should ever edit, touch or modify, nor is it really documented well. I guess in this case I have to do this 100% manually. - One document says NEVER NEVER - another document says YES do this. Sort of confusing.

EDIT: In reading the "panamax" documentation - there is no option where Panamax accepts a Cargo.LOCK file - so How do I tell PANAMAX to read that file?

If you wanted, you could probably re-write panamax itself trivially

Trivial for you - who fully understands RUST, non-trivial for me. 10+ Years of python, and 30-40+ years of C - but only a few months of RUST.. It is not a trivial thing for me to do. (Blood dripping from forehead as I beat my head into the brick wall called RUST)

From what I can tell from the docs, isn’t that the whole point of panamax?

YES it sort of is - but i go back to the definition of a caching server - perhaps I took things literally and I have gone down a rabbit hole I should not have gone down and I was mislead by reading the document literally.

The POINT of PANAMAX - as I understand is to act as a caching server, which transparently fetches upstream and saves (caches) a copy of things locally.

The idea here is to populate the cache, then disconnect and run from the cache. And deal with the cache errors as they occur.

What you are describing is how to make the "side effects" into what I need work.. The docs are not clear in that way. I have been asking many questions - I thought respectably - and often the answer is: "Go read this 500 page document" (PDF web page, what ever) I have read that damn thing endlessly and several others to the point where I think my eyes will pop out of my head. What seems to be required is about 50 little one line things that are documented but scattered across that 500 page document. Finding those 50 little needles in the haystack is really hard.

Or is your problem with installing panamax itself without internet access?

No - I can do that - its pretty simple: "outside" - I git clone panamax, then "cargo vendor" panamax, and TAR The source up.. Copy the tar into the closed area and build with special non-obvious flags and I have a panamax executable - the next steps are not exactly explained very well.

Once Panamax (Executable) is ready - the next steps would be:

(A) How to specify to panamax what files I want to copy over - Your suggestion is via the CARGO.LOCK file..

(B) Where is the cache directory - I can zip/tar that and move that to the new machine

(C) I then run PANAMAX - but it is my understanding that the first step is to "SYNC" - the README for panamax says:

Once you have made the changes to mirror.toml, it is time to synchronize your mirror!

Maybe I am reading this wrong - But I cannot do that the "mirror server" will have no connection to the outside world. Thus no means for panamax to sync with the out side world.

Or is your problem with installing panamax itself without internet access?

That seems pretty simple:

a) GIT CLONE panamax
B) Cargo vendor panamax
C) ZIP/TAR the source - move to the closed room
D) Unpack, and Build with the special OFFLINE options {more then just: '--offline'}
E) Install the binary, then arrange a startup script to run panamax

The documentation is probably incomplete. I'm working from the PR description I had previously linked, and double checked the source code (main.rs) to see if the options does in fact still exist. But it turns out, you can also check out the output of panamax sync --help for some built in command line argument documentation for the sync subcommand. I used something like

panamax sync --skip-rustup --cargo-lock path/to/Cargo.lock path/to/mirror-directory

and it ended up synchronizing the crates.io index (that one is always included completely apparently), and for downloaded crate files it only fetched and included exactly those from the lock file.

The sync command happens before the transfer to the target machine and it's what populates the directory for the mirror which you'll want to transfer afterwards. That same directory should previously have been set up using the init subcommand. The location is whatever you tell it in the command line argument for each of the invocations.


Incidentally, if you also want to use this panamax tool for mirroring the rust toolchains (the compiler etc), the init also creates a config file for you where you could select specific rust versions if you wanted to (the default config should have sufficient comments already to show how it's done - I haven't tested any of it myself though).

and that's a prime example of my frustration with all things rust…
i too tried looking for something obvious like pythons ā€œargparseā€ in main.rs…only not yo see any thing obvious.

as i described earlier- a one line here, one line there all scattered it is documented just scattered around on about 500 pages in that cargo book - needles in a haystack

The tool you want doesn't exist (yet).

For air-gapping of individual projects, there's cargo vendor.

Without vendoring, cargo fetch downloads crates needed for the build (based on `Cargo.lock if present), which you can use to prepopulate the cache.

You can then ship ~/.cargo/registry/{index,cache} to another machine.

Cargo can use a local cache offline when you add --offline flag or set env CARGO_NET_OFFLINE=1.

You can also replace crates.io entirely with a local directory:

For mirroring/filtering/airgapping of crates.i, make sure to use the source replacement feature, and not just an alternate registry. When you set up your own registry without configuring it as a crates.io source replacement, it will be treated as a completely different additional set of packages that are incompatible with crates.io and can't replace any crates.io packages.

I can understand it's challenging to learn a new programming language. Still, there's no reason why things need to be named the same. The most popular Rust library for command line argument parsing is called clap. In the case of panamax's main.rs file, I'm not sure where you've been looking, since literally the entire file consists of nothing else but a (declarative) usage of clap, describing the entire subcommands tree.

3 Likes

if It was only one project ... I define project as a a crate .. it would be simple.

In my case it is 10 to 20 people working on numerous different crates
What we cannnot suppoirt is 10 to 20 (one per person) solution.

And - we need a CI?CD solution that can automatically build the system

Hence we really require a UNION of multiple vendored projects

I am not saying it must be the same name - I'm saying just reading the 'main.rs' file - it was not obvious what is going on.

Yea - the problem I'm running into now is this:

The suggestion is to create a "cargo.lock" file and have panamax use that.

The problem is with multiple developers, multiple crates - you really need/want/desire a union of all crates required by multiple crates.

And that means - one requires MULTIPLE versions of the same crate (ie: Crate X requres FOO version 1.0.1, and Crate Y, requires FOO version 1.3.4 - the Cargo Lock method only supports a SINGLE instance / version of a crate - not multiple versions.

GRRR...

I don't think this is true, how did you find this out?

It’s true for compatible versions such as 1.0.1 and 1.3.4. Cargo will not allow a lock file, or a build, to contain both of them at once. Those are compatible versions, so this is not usually something you need, but there can be reasons:

  • There is a bug in 1.3.4 that makes it not compatible in practice.
  • You’re trying to reproduce an old build exactly.

So, a single lock file might sometimes suffice for small amounts of airgapped work, but it isn’t a very effective general solution.

Yes, but it’s not true that panamax cares about this.


I just finished testing it, and it downloads them without issue, even compatible versions.

@duaneellissd I have also just noticed there are some crate…name.badsha256 files appearing for the crates where I left the checksum empty, and I’m not 100% certain whether or not that would cause any issue for panamax serving them down the line, so that’s a caveat on my previous claim that the sha was unnecessary - I’m not quite certain on that anymore.

The issue I am facing is a usability one.

I need an offline (closed environment mirror of crates.io) and that seems impossibly hard

From a HUMAN point of view - the view is this: The HUMAN requests a new crate be added to the off line closed system. A Human should only answer 2 questions:

a) The name of the crate and
b) the VERSION of the crate

Preferably - in simple terms: "name=version" - this is what is expected in the Cargo.toml file.

The suggestion I was given to creat the mirror was to use a Cargo.Lock file - and cargo creates this via a command line option - so - Next step:

Step 1: List all required crates in the Cargo.TOML file
Step 2: Use cargo to convert the file - BRICK WALL

If you want to try this: (A) Pick a few command line tools, example: "bindgen" and "cbindgen" and "cargo-download" - in a closed environment, specifically (B) using a private isolated crates.io mirror of some sort.

I agree Cargo would not want to support two version of the same crate for a project
BUT _ consider this case: Parent Requires CHILD(A) - and CHILD(B) - Child(A) wants FOO-1.2.3, and Child(B) Wants FOO-3.2.1 - what should the vendored directory have?
(I donot know - I did not try this) - but I would think the vendored directory should have both versions.

That is sort of my situation Human(A) needs 1.2.3, and Human B needs 3.2.1...
Both seem to be supportable in the Cargo.Lock file (that panamax requires)

So at this point I think I have 2 options:
A) Modify cargo to support what i need in this senario -
UGH.. I do not want to maintain a private branch of such a core tool
B) Write my own 'cargo.toml' -> cargo.lock converter

GRR... Everything I have do with rust seems to have many people telling me - Oh that is so easy, stop fighting it .. IT should just work.. and in reality on the surface it does - but as you dig in and do the complete job it falls apart. Making the entire rust experience so bloody painful.

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.