Should cargo bundle all local files when packaging a crate?

I was recently bitten by some behaviour in Cargo that I found quite surprising. I'm not sure if it's just a failure on my part to read the documentation sufficiently thoroughly, or if it really is quite a dangerous/unintuitive default. I'd like to hear whether others in the community find this a good idea or if there's agreement it should be changed.

Specifically, when you run cargo package or cargo publish, cargo grabs all local files in the directory (as long as they're not in .gitignore), and includes them in the package that is to be published. While this is a little more convenient than having to list everything explicitly, it feels like a default that fails the wrong way when people don't know about it. I can easily envision scenarios where people publish (to the whole world on crates.io, and irrevocably!) things they didn't intend to.

My suggestion would be to remove this feature entirely, and require a listing of desired files in Cargo.toml (maybe printing a warning when there are files present that aren't listed?). Alternatively, the feature could remain, but as an explicit opt-in requiring users to select it by adding a flag to Cargo.toml.

The problem with both of these approaches, of course, is that they'd be a non-backwards-compatible breaking change. They'd also be a little less convenient than having everything work automagically. However, I think safety is more important than the very minor amount of effort this feature saves. Thoughts?

I've also a filed a bug in the cargo repo, which links to some previous discussion/mentions of the issue.

7 Likes

A better solution might be to add an explicit include section to the manifest, and warn whenever packaging includes something not covered by that list.

I know I keep forgetting about this behaviour (because I never use it), but it's also a slight pain to check what is and isn't included in the package. target/package is also used to do a build of the package, so it contains files that aren't actually in the package.

That said, you say "safety" is important... safety from what? Is anyone seriously storing all_my_passwords.txt or nuclear_launch_codes.txt in their crate source folders?

Yeah, this mechanism already exists (without the warnings, though), but it's not the default, and (in my opinion) not called out enough. I personally published two crates before finding out about it. I'd very much like to go with the explicit route by default.

Discussion on IRC suggests that yes, at the very least one person did something very much like that :slight_smile:

There's also the case of keeping some test/sample data around that you don't necessarily want to publish, but which you're using to try out your code.

I was caught out by it in that I published some personal TODOs about what I wanted to do with the code, as well as helper scripts I used in the release process. Nothing embarrassing or "bad", but just not what I intended to put out there for people to see.

We have multiple tools in Ruby-land that do this kind of thing: I always choose the ones that automatically include stuff over the ones that have an explicit list. The key is this bit:

If I don't want something to be included, it should be ignored. That's the bug, imho. Re-listing everything that's already being tracked by git is not just annoying, but also error-prone too. I've caused bugs because I added a new file but then forgot to update the list.

But this ties cargo to git in a fashion that it is usually not. If I'm not using git, how would I ignore files?

For that reason, I prefer manifests. The Manifest - in absence of other hints - could considered * with a warning if a file is included that would be matched by .gitignore)

2 Likes

Well, this is really tying it to version control more than git specifically. Cargo also has hg support, and I'd imagine that it would work the same way there. You're only stuck here if you don't use version control at all, and then you have other problems :slightly_smiling:

I disagree. It introduces a feature which relies on the presence of another tool besides cargo for a crucial feature of cargo. This is a very different kind of dependency then most other parts of cargo have. While all other features of cargo are contained in the sense that a user has to explicitly ask for them ("use git to download this!"), this isn't.

(You might notice that I'm not a fan of the "do one thing and do it well" mantra)

2 Likes

I think in general the issue is that I don't think Cargo should be mandating any particular VCS or style of workflow. Not everyone uses Git, and not everyone uses it in the same way. The VCS and Cargo are orthogonal tools, and integrations should be optional, explicit, and opt-in.

I'd totally be in support of an installable external Cargo sub-command that managed the list for you based on your VCS. Even Cargo itself could issue a warning or something if files are present but not listed?

But regardless, I still think it's a false economy: it saves a little effort and care, at the expense of potentially catastrophic bad effects. It's like adding code that speeds up compile times by 0.1%, but 0.1% of the time it also nukes your disk — not worth it.

I think this may be a bit of an idiosyncrasy of the Ruby world that those outside of it find rather strange :slight_smile: I'm not sure it's fair to port over those cultural practices to Rust without it being clear that the Rust community wants things to be that way too.

Unless I'm missing it or it's in another file, I think Cargo only takes into account Git repos and .gitignore, so yes, I think this feature is a bit like requiring that you must use Git to use Rust: https://github.com/rust-lang/cargo/blob/master/src/cargo/sources/path.rs

Git is not required. If you use git, you can list excluded files in .gitignore. If you don't use git, you can put them in Cargo.toml. Using .gitignore by default merely provides convenience.

If your VCS is not (yet) supported by Cargo's directory-walking code, then in the worst case you may need to list your excluded files in two places (e.g. both .hgignore and Cargo.toml).

Sure; the real issue isn't the VCS dependency (though being Git-only is rather odd), it's the question of whether to fail safely or fail unsafely by default.

The discussion of Git or other exclude mechanisms just relates to how problematic it's likely to be in practice, since possibly some users will be partially protected by Git's ignore mechanism. But even those users are in trouble if they put a temporary file into the directory to test something out, but don't add it to .gitignore.

The real question is: is it worth having a potentially catastrophic private information leakage in order to save a bit of effort, and possibly avoid some easily-fixable bugs relating to forgotten files? (I'd argue that "src/**/*" is likely to be commonly-listed explicit include, so I don't know how prevalent that will be in practice).

Thought it doesn't just package files that are tracked by git, but all files that are not explicitly ignored (even untracked ones). Now if you're rigorous with .gitignore that won't be a problem but I suspect that not everyone is (I am not, or at least was not until I nearly published some log files to crates.io ^^). I think having cargo only publish files that are tracked by a versioning tool (or at least give a warning if it is about to publish untracked files) could be a good compromise, but I find the current situation quite error-prone.

1 Like

My issue with this is that I don't like cramming every random filename that might or might not be in a repository into .gitignore. Log files, .S/.ll files, executables for temporary builds, crash dumps... it's like asking someone to exhaustively list every crime they've never committed or intend to commit.

Also, if Cargo really is going to lean on source control, shouldn't it be making a fresh checkout of the repo somewhere when packaging to ensure the package is tied to a specific commit? :stuck_out_tongue:

2 Likes

Simply copying up a copy of the current working directory is very wrong. Using .gitignore for this purpose, rather than utilizing Git by, say, doing an export of a particular tag to a temporary directory, is the wrong way to do this. That means that if you have any local uncommitted changes to the working copy, they will be published but not available in source control. It also means that if you have temp files, core dumps, etc, as @DanielKeep points out, or something I do sometimes is write myself local, throwaway notes in a file that is not checked in, but is not in .gitignore so I will see it in Magit and remember to look in it for what I was last working on, then those files will also be copied up and released.

Now, due to various issues like this, or accidentally checking things in, I have a strict rule of never putting any credentials in a source code repository, even in files that are covered with .gitignore. However, some people do and can accidentally commit them, but with this behavior even if they don't accidentally commit them, they may accidentally upload them without knowing.

I don't think it's the convenience support for Git that's the most problematic; it's that it doesn't really support how you would expect exporting from a Git repository to work, but instead just using interpretation of .gitignore in a way that supports some people's workflow but produces very surprising, and even possibly dangerous results, for others.

Most other packaging systems I've used require you to do something to explicitly include files in a manifest, though many support globs so if you really do want everything you can frequently just write * or something of the sort.

A good approach would be, at least, to warn the user about files being included in the Cargo archive while they are excluded from the VCS (if any). Exceptions could then be added to the Cargo configuration file to not print warnings.

(I know, I know, I could easily try this myself, but I'm sick and I'm posting from my phone)

Does cargo use the contents of .gitignore literally for this purpose, or defer to something like git-archive or git-ls-files if the crate's a git repo? What happens To files that are listed in, say, the excludes file, but not .gitignore?

1 Like

If I'm not mistaken the desired behaviour can already be archived by using include in Cargo.toml. So why not just change the way cargo new works to automatically add a include with all needed globs for the standard case (like rust files in src). This would at last not introduce any breakage, I think.

Additionally providing some VCS specific checks would round it up (e.g. lint on build and/or package if Cargo includes some files not included by VCS; defaultly (but not mandatory) bind the packaging to commits etc.).

I also observed this strange behaviour, because I had a .core file which cargo wanted to upload (and I think it failed, because the package was too big). At least I'd expect that cargo would warn me if any non-standard file would be uploaded (with non-standard I mean, any file other than *.rs, Cargo.toml, README* and LICENSE*). It would also be nice if cargo package would list the files that are going to be included.

That could become quite a long list.

Isn't explicitly including files as much error prone?

To be honest, the ratio of files I want to exclude versus the files I want to include is most of the time very small and I like that the default is "package everything in my project except what I told you not to". If you want to work the other way around you just have to add a couple of lines in your manifest. If you use temporary files, just use a naming convention like: note-* or test-* or a directory and exclude that.

The fact that Cargo reads your .gitignore is just a convenience bonus.

Well, maybe it could warn about files that exist in the directory but are not included in the manifest (or warn about files included in the package that are not in the git index)? I rarely want to include files other than *.rs files in my projects. And none of the temporary files is named *.rs. Also, I rarely want to upload files which are not managed by git. So for me, including all files in the directory just feels wrong.