Crates.io – My personal impression

Yesterday I published my first crate on crates.io, mostly as an exercise to get accustomed with the work flow.

I don't really use Git a lot yet (I prefer Mercurial for a couple of reasons, even if arguably Mercurial might be less powerful in certain regards), but I do have a mostly unused GitHub account for a while.

Creating an account on crates.io was trivial, and login works fine with SSO using my GitHub account. I then used cargo login on the command line and pasted an API key that can easily be created on crates.io/me.

I then added three lines to the [package] section of the Cargo.toml section:

authors = ["My name <my@email.example>"]
description = "Short description"
license = "MIT OR Apache-2.0"

(Assuming you want to publish as MIT/X11 or Apache 2.0 license, depending on the user's choice.)

Pretty easy until here! :smiley: The next step is even easier:

Simply type cargo publish and your crate is public.

I have to say I really like how easy it is to publish your crate, including auto-generated documentation on Docs.rs and marking the crate as MIT OR Apache-2.0 licensed (which is really a good thing to do as license issues with viral licenses can be a **** in the ***).

I guess there's more metadata I should add to the manifest later, but a few minutes are sufficient to get your project public! :+1:

There are few things I feel a tiny bit worried about though.

The first thing: Maybe it's a bit toooo easy to publish. Apparently accidential publications happen and it appears like crates.io has the policy to "not remove crates for any reason unless […] legally required to" (not sure if that's an authoritative statement though, see here). I checked the Crates.io Package Policies and didn't find a clear statement there either, other than, β€œWe will do what the law requires us to do, and address flagrant violations of the Rust Code of Conduct.”

Thinking about this, I also wondered how easy it is to accidentally publish a file that's not intended for publication, and I entered cargo package --list, and got the following result:

% cargo package --list
warning: manifest has no documentation, homepage or repository.
See https://doc.rust-lang.org/cargo/reference/manifest.html#package-metadata for more info.
Cargo.toml
Cargo.toml.orig
src/lib.rs

I noticed the Cargo.toml.orig file. Did I just accidentally upload an old version of my manifest? :scream: Apparently not, as the Cargo.toml file is modified during publication and the original manifest is uploaded under the name Cargo.toml.orig. :sweat_smile:

But it made me wonder: What happens if there's accidentlly an untracked file in my directory. :thinking: Will it be published too? Let's try:

% dd if=/dev/urandom of=firefox.core bs=1m count=1 
1+0 records in
1+0 records out
1048576 bytes transferred in 0.002701 secs (388183636 bytes/sec)

% cargo package --list
warning: manifest has no documentation, homepage or repository.
See https://doc.rust-lang.org/cargo/reference/manifest.html#package-metadata for more info.
Cargo.toml
Cargo.toml.orig
firefox.core
src/lib.rs

Dang! :exploding_head: That's certainly not what I want. And that file isn't even tracked in my repository:

% hg status
? firefox.core

I (personally) don't feel comfortable having to bring up the law to get rid of accidentally published data, so I decided to delete my API key again for safety purposes and to only login when I'm about to publish a crate/package. That makes the whole process less smooth than I thought it was, unless I'm bold and don't care about the risks.

Researching a bit on the net, I noticed there is

Don't get me wrong here: I think it's important to keep hosting crates that have been published under an Open Source license. On the other hand, there may be valid reasons to remove a published file (such as a firefox coredump, which shall just serve as an example here). Perhaps there is a good reason to not publish the policies on package/version deletion and to not talk about it too much, but it leaves a bad feeling, especially when publishing is done with a low number of keystokes (e.g. c + cursor up + enter on csh, depending on your current shell history and other projects you're working on) and explicit statements that data won't be deleted unless there is a legal reason to do so.

If I decide using crates.io in the future, I'll at least try to make sure I'm normally not logged in with Cargo. But I'm not sure if I want to keep using crates.io, given the communicated policies. I'll have to think on it.

I'm sure there is a lot of people considering to write responses like, β€œIt's your own fault if you can't operate your shell properly” or β€œjust deal with it, that's how things work”, but that won't fix my unease, which I believe is well-founded, and I also think that it would be possible to structure things differently (either in regard to policies or in regard to default program behavior – or both). Correct me if I'm wrong.

That's it about publications, but there is also a second issue that I find a bit irritating. I personally dislike the fact that namespace is centrally organized and non-hierarchical. I'm personally fond of distributed and decentral solutions. A central namespace without hierarchy seems to thwart that. On the other hand, we have short crate names and, yeah, tld::example::mydomain::mycrate might read clunky. However, I guess this topic has been discussed plenty. Nevertheless, I'm a bit sad that many new systems don't utilize decentralization and there is a trend to use central services.

I hope my post didn't come off like a rant. (I feel like sometimes whenever someone criticizes something on this forum, a lot of people jump in to boldly defend Rust!) As I said a couple of times: I really love Rust and the ecosystem, and I have come across some amazing crates already. My favourites are:

And I guess I will discover many more in the future. :slightly_smiling_face:

8 Likes

Interestingly if you attempt to publish while using git as your VCS it will refuse if there are untracked or modified files:

error: 2 files in the working directory contain changes that were not yet committed into git:

Cargo.toml
src/lib.rs

to proceed despite this and include the uncommitted changes, pass the `--allow-dirty` flag
7 Likes

If you want to avoid publishing accidentally, you can rely on one of the following:

  1. Don't update the version number until you do the actual release so any attempt to publish would fail due to the version number already being published.
  2. Put publish = false in your Cargo.toml.

In Tokio, we mainly rely on the first protection. Every time we do a release, we make a PR that prepares the release by updating the changelog and updating the version number. You can find an example here.

21 Likes

So maybe the behavior with Mercurial could/should be fixed in Cargo.

Update: I reported an issue (#10023) for Cargo regarding the problem.

That sounds like a good thing to keep in mind when working on crates that are for publication, where you don't want to accidentally release a new version.

That doesn't help me when I create a new project for testing purposes, like cargo new xyz. That will create a manifest that looks as follows:

[package] 
name = "xyz" 
version = "0.1.0" 
edition = "2021" 
 
# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html 
 
[dependencies]

If there wouldn't be an existing package named xyz on crates.io, then I could easily upload and permanently publish any experimental code. I tried to put [package] publish = false into my ~/.cargo/config.toml, but apparently that doesn't help. Maybe a workaround is to use another default registry?

Even if there exists a way to change the default, providing an default that easily uploads and permanently publishes a crate as version "0.1.0" doesn't seem to be a very wise choice.

1 Like

The default output of cargo new can't be published as-is anyway since it doesn't specify a license.

10 Likes

I wrote a simple tool which performs some repetitive tasks and helps me avoid silly mistakes when publishing crates. In short it's a two-step process; first it will build everything (examples and all), run tests and generate the docs and if everything was successful it will run a dry-run publishing. I inspect the results and if there's something unexpected I update the exclude-list in Cargo.toml and rerun. Next, if everything went fine, I rerun the tool and tell it to do the real publishing, this too will build everything, run tests, generate docs and all. If everything was successful it will create a repo tag using the name and version in Cargo.toml and then perform a real publishing.

This has, so far, helped me avoid wrongly publishing anything by mistake, and I don't forget to run all tests and build all the examples before publishing.

With regards to things like massive core dumps (like a firefox.core), I believe there's a size limit to total package sizes, so I'm not sure crates.io would accept a typical firefox core dump.

3 Likes

I guess it depends on what type of file you accidentally published, but what you might do in these circumstances is pretty similar to what you might when you commit the wrong thing to git or upload something to the internet.

  • if the file was unintentional but harmless I'd probably just leave it be
  • if you accidentally publish broken code (e.g. unsaved changes or you forgot to run cargo test before publishing) then you can yank the crate
  • if it's something sensitive like an API key then you'll need to rotate keys
  • if it's something sensitive that can't be invalidated (e.g. valuable intellectual property) then I'd a) let my boss know asap so there's a paper trail and they can kick off any internal processes, and b) contact the crates.io maintainers to explain the situation and see what can be done manually

I understand that saying "just don't make mistakes" is a pretty unsatisfying answer, but at the end of the day someone has to hit the cargo publish button and the onus is on that person to make sure they are doing the right thing.

For what it's worth, you can directly add a repository or a file on disk as a dependency, and there is also the concept of alternative registries if you want to upload your package somewhere but not go through crates.io. I've used these mechanisms plenty of times in non-open projects - you just don't hear about it because most projects talked about in the open are, well, open-source.

It's just that crates.io requires uploaded crates to only use other crates on crates.io. That way we won't accidentally break half the ecosystem when a random GitHub repo becomes inaccessible.

2 Likes

Are you sure publication without license isn't possible? The crate "a", for example, doesn't have a license.

1 Like

That was published in 2014, probably before licenses were required.

4 Likes

That might lessen the problem if it's really the case. I trust you on this, I guess, but… I don't don't wanna test it now :sweat_smile:.

If publication is denied on server-side, then it still could be an issue with very sensitive data. I know, this is more a hypothetical issue: Maybe only the manifest gets transmitted over the network and the process is aborted before any other files are uploaded; maybe we can trust crates.io to not further record/process/store any non-published crate (meta)data, etc., plus it's unlikely that such data ends up in our package directory anyway.

I think if the above issue regarding behavior with version control systems other than Git is solved, the whole thing is not that bad anymore. I still feel a bit of unease.

I usually prefer "publish buttons" (or any other buttons with irrevocable consequences) to have some sort of safety cover :wink:.

By default cargo will try to build the packaged crate before publishing. This ensures that all necessary files are included in the packaged crate and you didn't accidentally leave something broken. It will also check that the source directory doesn't get changed as part of building, as cargo treats the source dir for published crates as readonly and doesn't invalidate any build caches if they change. I don't think it runs the tests though.

1 Like

Using Rust at all is a risky business! You should never forget that. For example you could use a crate and not spot it is a trojan. I think accidentally publishing something is one of the more minor risks.

2 Likes

crates.io really needs to have a 2-factor auth for publishing. Once this is implemented, you won't be able to publish by accident.

5 Likes

I think this just moves the problem around, though.

For example, what about projects that use CI to publish crates? Presumably there would still be a way to publish things automatically, so all I need to do is accidentally tag a dodgy commit and push it up and we're back where we started.

I agree that 2FA should be required, though. At the moment we're just relying on people enabling 2FA for their GitHub account.

3 Likes

This reminds me of the "Windows protected your PC" that users have trained themselves to click past.

Windows protected your PC

I use cargo release when publishing crates to crates.io, and even though it asks "are you sure you want to release foo v1.2.3" I've still accidentally released the wrong thing (e.g. code that doesn't always compile or doing a minor bump instead of a patch).

I would love if my tools stopped me from making these mistakes, but as the saying goes, "If you design something to be idiot proof, the universe will design a better idiot".

10 Likes

Thanks for all your responses and opinions, of which I share some, and others not. I think a lot can be improved by changing the program behavior in such a way that accidental publications are less easy to make. This doesn't only affect whether I want to publish at certain point, but also what gets published.

2FA will ensure that I don't accidentally publish when I don't want to do it. That's a good thing (apart from the increased security level, which is also a good thing). But it won't solve that I might publish the wrong thing. E.g. if all files in a directory are published in their current state (the case if you use Mercural or Fossil), then this can easily lead to mistakes.

Yes, accidents can always happen, dependencies can contain malware, etc. etc. But those arguments aren't a reason to ignore the problems, but rather the opposite:

Tools should be supportive in helping us to only publish what we want to publish and when we want to publish. Requiring files to be checked-in is such a supportive measure (which seems to be implemented by Cargo for Git, but not other VCSs).

Organizational workflows should be able to handle cases where things go wrong. Things always can go wrong. Yes, if credentials are published (or even transmitted), then those credentials could be invalidated, but credentials aren't the only data that may accidentally be published. I personally feel unwell knowing that a mistake would require me to weigh whether I want to deal with some data (such as a private e-mail address) being distributed by crates.io against my will for an indefinite time, or whether I would contact the legal department or even take a lawyer. (Not sure how bad things really are, I just can read the policies and the above linked ticket, which speaks for itself.) That said, I doubt crates.io would want me to publish future crates using the platform after such a legal dispute. And given the de-facto centralized approach of the Rust ecosystem, that would be unpleasant for whoever was banned from publishing future crates.

Nonetheless, I would like to emphasize that it is desirable that actual software will stay available – especially when published under an Open Source license! I don't want to see breaking crates either, because someone decides to remove their 10 year old package that everyone depends on, and I'm (personally) not fond of data protection laws that allow anything to be deleted. But that doesn't mean non-relevant metadata or accidentally commited files need to be distributed indefinitely and deliberatly by an entity (crates.io in this case) which has been informed about the issue.

Finding a compromise here will require decisions on a case-by-case basis. I'm aware that this is extra work for whoever makes the decisions in the end, and it might (or might not) even require re-thinking certain technical workflows.

I would appreciate if these concerns wouldn't be dismissed as quickly and viewed ridigly as seen in the respective ticket and published policies.

what about projects that use CI to publish crates?

You can't set up a CI by accident :slight_smile:

It's not really protecting users from accidents when they're taking effort to set up a publishing pipeline and hack it to make 2FA meaningless (when both 2FA secret and login token are accessible to the CI at will that's merely 1-factor auth done twice).

At the moment we're just relying on people enabling 2FA for their GitHub account.

GitHub's 2FA is not involved in publishing. Currently the token in ~/.cargo/credentials is the one and only factor required to publish a crate. If this credential leaks, there's no second factor to stop its abuse.

3 Likes

It could say "Write exactly these words: 'YES I want to release foo:1.2.3 that did not pass [...] tests on [...] targets.'". (assuming a publish command that is able to run tests on many targets) You have to read and copy the warning so you are more likely to notice a problem before publishing.

While crates.io is not technically subject to GDPR, we support its aims to allow people to remove their personal information (and recent changes to Cargo have made it less likely you'll publish your email address in any case). If you email help@crates.io with proof of crate ownership and detailing what personal information you'd like to be removed, we consider this a request falling under the "legal" heading and we will remove the information. There have been a number times when people have changed their name and requested we remove the crate versions containing their old name and we have been happy to take care of it.

9 Likes