PSA: Check if your Cargo crates are clean and tagged

kornel · April 2, 2024, 5:13pm

The xz backdoor (CVE-2024-3094) took advantage of the fact that distributed "source" tarballs don't have to contain the same files as a public repository, and used bundled unit test files and an autogenerated script to hide malicious code that injected a backdoor.

Unfortunately, Cargo crates have a similar weakness. The package uploaded to crates.io can contain arbitrary files, and there's no guarantee these files come from any repository. Build scripts and proc macros are able to inject code into crates.

Crates-io crates are getting more scrutiny now (see also cargo-vet and cargo-crev), so please check your crates to ensure they're not containing any suspicious and difficult to review files. Make sure that their contents matches their public git repository, and that Cargo is able to include a valid commit hash with the upload.

Use `include` in `Cargo.toml`

Add include = ["src/*.rs", "LICENSE", "etc"] to your Cargo.toml. Only package files that are necessary to build the crate. I recommend include instead of exclude, because it's usually easier to list needed files/directories than to catch all possible dot-files and temp files that may get dropped in the project directory.

(edited) If your tests need large data files or binary blobs, I recommend excluding such tests from the published packages. Besides adding even more bloat to downloads, now this is looking suspicious and will make reviewing crates even harder.

If you're bundling C source (in sys crates), try to minimize amount of vendored files too (but keep all the LICENSE files).

Publish from a clean git repo

Include the repository property in the Cargo.toml. If you've forked a crate, don't forget to point it to your forked URL.

When you run cargo publish make sure there are no uncommitted "dirty" files, because it prevents Cargo from associating the package with a repo commit. When you publish, make sure to push this commit to the public repository, and ideally also give it a git tag (tools like cargo-release can automate that).

There is currently an ongoing effort to verify all Cargo crates against their repositories. Crates without a valid repository link and a matching clean commit are problematic.

Keep an eye on issues

I've created a maintainer dashboard that comes with an RSS feed you can subscribe to to be notified about issues in your crates. When I finish scanning crates, I'll add warnings about untracked or mismatched files in crates.io packages.

blonk · April 2, 2024, 6:13pm

I asked specifically about this a while back, and someone told me that crater runs do make use of tests, so tests should be included for the overall good of the ecosystem.

I'll follow the advice on these types of things from Rust dinosaurs, but in this case two dinosaurs are sending conflicting signals.

A mitigation to this is to wrap cargo publish around a tool that does run tests, generate docs, etc, where warnings are treated as errors. I'm sure I'm not the only one to do this. (But, to your point, I'm sure I'm also not the only one who doesn't actually run the tests on all the supported platforms...).

Btw, I would like to exclude benchmarks from publishing, but the benchmarks need to be explicitly named in Cargo.toml. If I put the benchmark in the exclude-list cargo complains that something is named, but doesn't exist. Is there a way to solve this? (I.e. define benchmarks in Cargo.toml, yet exclude them from publishing?).

weiznich · April 2, 2024, 6:22pm

First of all thanks for working on this feature. It might be really useful if done in the right way.

Unfortunately the advice currently given in the maintainer dashboard might not be applicable to in certain situations due to the way cargo publish works. I've filled Impossible to address warning in the Maintainer Dashboard (#179) · Issues · lib.rs / Lib.rs 🔰 main project · GitLab for that and I'm mainly posting here for visibliy again. Given that there as no good solution for this problem yet, I'm quite sceptical of putting a warning on crates that does not satisfy these requirements, as there is as far as I'm aware just no alternative to this setup for proc-macro crates. Maybe it would be possible to actualy calculate the diff and somewhat judge what changed? (Athough I guess that's not possible as the git commit information itself is missing in this case, right?)

cuviper · April 2, 2024, 6:29pm

We also run tests "downstream" in Fedora, where we often uncover portability issues on different architectures -- especially big-endian s390x.

kornel · April 2, 2024, 6:38pm

I think crater runs can also use git repositories? There are conflicting requirements, but I'm siding with optimizing for code reviews and typical usage of dependencies. Rust Foundation said it's getting expensive to host crates.io and the traffic keeps growing exponentially, so I think it's a good idea to trim tarballs to minimum, especially when tests need fixtures (there are crates with kilobytes of code and megabytes of junk).

kornel · April 2, 2024, 6:42pm

Could you use .cargo_vcs_info.json to run tests from the repo instead? Especially once tools like cargo-goggles will make it possible to verify that the crates-io tarball is indeed derived from the repo?

cuviper · April 2, 2024, 6:48pm

The rpmbuild process runs offline, so we would have to capture the repo as an additional source archive, or maybe even instead of the published crate. We definitely want to test what we use though, so there can't be any differences (by cargo-goggles or whatever else).

And if we always go grab the tests from the repo, what was the point in separating it?
(remember that the xz backdoor also specifically targeted deb/rpm build environments)

kornel · April 2, 2024, 6:51pm

The stuff in the dashboard is just a recommendation/FYI, you're not required to address these things if it's not possible.

I'm working on various fallbacks. For circular dependencies, perhaps Cargo will need to add ability to disable checks for dev-dependencies, or include the commit hash even if there are "dirty" files.

Vorpal · April 2, 2024, 6:53pm

This would break arch Linux rust packaging. See Rust package guidelines - ArchWiki which recommends downloading the crate file from crates.io and running tests from there.

weiznich · April 2, 2024, 7:00pm

I know that I'm not required to fix this things immediately, but I remember that you mentioned in an other context that warnings in the maintainer dashboard result in downranking the crate in various listings. So while I'm not strictly required to fix these issues, it's still a considerable disadvantage of not being able to do that. Also a warning on the lib.rs page might discourage users further from using a specific crate.

(Again: I'm not saying that this is generally a bad idea, I merely want to point to various edge cases that somehow need to be handled before actually warn against crates with this kind of issue.)

kornel · April 2, 2024, 7:13pm

These issues don't automatically affect ranking. I'm aware of your limitation, and expect there will be many more, so I'll be looking for solutions before rolling it out. Unfixable false positives are not helping anyone.

kornel · April 2, 2024, 7:17pm

This workflow is already broken. Such tests are not supported by Cargo, and there are already crates where the tests are not packaged correctly (because they rely on files in the workspace that aren't in the crate, or because they use fixtures that have been excluded, but inline #[test] depending on these fixtures could not have been excluded).

Vorpal · April 2, 2024, 7:20pm

Speaking of the ranking, a dependency being listed as our of date or significantly out of date makes little sense currently. I have been on e.g. nix 0.27.1 with 0.28 being released hours ago and it was listed as "significantly" out of date. Yet if I'm on 1.1 of another dependency and 1.6 was just released it is not "significantly" out of date?

If this is just about semver it should say so, because "significantly" implies lagging behind a lot time wise. Which is very different.

Vorpal · April 2, 2024, 7:25pm

In that case the rust project should offer official guidelines for how to best package Rust packages for Linux distros, homebrew on mac etc.

Right now both Arch and Redhat/Fedora are using tests from crates.io apparently (if I understand @cuviper correctly). If that is not the correct way there should be official documentation on what is.

Right now I consider your recommendation to strip tests from crates actively harmful at worst and misguided at best.

It would be better to extend the ecosystem so that you could have something like cargo test --all-deps run tests for all your dependencies as well to ensure that they still pass in your context (different rust version, different os, different target etc can all affect tests, so this would be better for reliability).

kornel · April 2, 2024, 7:26pm

The fix is coming:

github.com/rust-lang/cargo

cargo publish has bad error message when explicit `[[bench]]` isn't in `package.include`

opened 05:40PM - 18 Feb 24 UTC

dhardy

C-bug Command-publish Command-package S-needs-team-input

### Problem `cargo publish` fails ### Steps Using the [latest rand comm…it](https://github.com/rust-random/rand/commit/b1427c6447ce6f2aaeda967f777d6daebed18164): ```sh $ cargo publish --dry-run Updating crates.io index Packaging rand v0.9.0-alpha.0 (/home/dhardy/projects/rand/rand) Updating crates.io index Verifying rand v0.9.0-alpha.0 (/home/dhardy/projects/rand/rand) error: failed to verify package tarball Caused by: failed to parse manifest at `/home/dhardy/projects/rand/rand/target/package/rand-0.9.0-alpha.0/Cargo.toml` Caused by: can't find `uniform` bench at `benches/uniform.rs` or `benches/uniform/main.rs`. Please specify bench.path if you want to use a non-default path. ``` ### Possible Solution(s) _No response_ ### Notes The relevant part of the `Cargo.toml`: ```toml [[bench]] name = "uniform" harness = false ``` This *could* have a `path` specified... but as far as I can tell, `cargo publish` is the only thing not happy to discover the default path (`benches/uniform.rs`). Despite this, the benchmark works fine: ```sh $ ls -l benches/uniform.rs -rw-r--r--. 1 dhardy dhardy 2696 Feb 13 11:39 benches/uniform.rs $ cargo bench --bench uniform --features small_rng Compiling rand v0.9.0-alpha.0 (/home/dhardy/projects/rand/rand) Finished `bench` profile [optimized] target(s) in 2.28s Running benches/uniform.rs (target/release/deps/uniform-38f8da4b7c78a479) samplei8/SmallRng/single time: [1.5141 ns 1.5143 ns 1.5145 ns] [snip] ``` ### Version ```text cargo 1.78.0-nightly (fc1d58fd0 2024-02-09) release: 1.78.0-nightly commit-hash: fc1d58fd0531a57a6b942a14cdcdbcb82ece16f3 commit-date: 2024-02-09 host: x86_64-unknown-linux-gnu libgit2: 1.7.2 (sys:0.18.2 vendored) libcurl: 8.6.0-DEV (sys:0.4.71+curl-8.6.0 vendored ssl:OpenSSL/1.1.1w) ssl: OpenSSL 1.1.1w 11 Sep 2023 os: Fedora 39 [64-bit] ```

kpreid · April 2, 2024, 7:30pm

As someone who has made this recommendation in the past: yes, I understand that's true and Crater does also use git. However, the value of this is not just Crater, but also putting the tests in a permanent archive that will (hopefully) persist (at least in mirrored/archive form) beyond any individual Git repository's being hosted at GitHub. (For example, sometime in the future, there could be some event that prompts mass migration off GitHub, which would likely include some users choosing to delete their repositories entirely and leave their crates.io packages as the only remaining thing.)

It is useful for debugging as well as research to be able to look at an immutable package and say “Does this pass its tests? No? Is that a sign of a stealthy compiler bug, or did an updated dependency break it, or was it somehow broken from the start?" And yes, some packages have tests broken because they depend on things that weren't packaged. That doesn't mean the idea of including test code at all is flawed, just that it's sometimes not done well because the current tooling isn't very good at supporting it. And all tests are potentially useful information about developer intent, whether or not they are automatically checkable.

I do agree that it would be good if downloading a package didn't necessarily require downloading its tests, but that is not a unique problem (documentation and examples also should in principle be optional, and those can't be dispensed with or you end up with missing parts on docs.rs) and, in my opinion, should be addressed by allowing a published package to consist of multiple archives, rather than by excluding those things from the public record entirely.

Vorpal · April 2, 2024, 7:48pm

Related to this, I went ahead and opened this thread to start a discussion about running tests recursively for your dependencies (over on the internals forum, as it feel more like the beginnings of a pre-RFC and not fitting on URLO): A way to run tests for all your dependencies, recursively? - Rust Internals

blonk · April 2, 2024, 7:52pm

I think I made up my mind on this: I'll continue to include tests, as long as they do not require any auxiliary data (or can generate that data itself), and as long as they aren't huge lists of auto-generated tests.

I'm open to be convinced otherwise, but I've had so much help figuring out the finer details of an API by reading the crates' tests that I find the "test should be part of the public record" argument particularly convincing.

Vorpal · April 2, 2024, 7:55pm

@cuviper How does Fedora/Red Hat deal with the myriad of library dependencies (in multiple semver versions) for RPM packaging? I'm genuinely interested.

If you don't package each and every dependency separately (and do it more like Arch with a download phase relying on the packaged lock file), you would end up only running tests for the "top" package.

If you do package every dependency separately, that seems potentially quite a lot of churn for an ecosystem such as Rust, where you don't do dynamic linking anyway (and presumably the library packages are thus just source code used during build of dependants).

alice · April 2, 2024, 7:57pm

Is there any way to filter the dashboard somehow? I'm an owner of a lot of crates for various reasons, e.g. as a bus-factor backup, or for things that are no longer actively maintained (e.g. tokio-compat-02). Most of the warnings are for crates that I just want to ignore.

Topic		Replies	Views
What to include when publishing a crate? help	26	3589	April 6, 2021
Crates.io – My personal impression community	21	2543	February 2, 2022
Should cargo bundle all local files when packaging a crate?	21	3694	January 12, 2023
Cargo Crusader 0.1 - Test the downstream impact of Rust crate changes before publishing announcements	21	8975	January 12, 2023
Should release contain test directories? help	3	368	September 30, 2022

PSA: Check if your Cargo crates are clean and tagged

Use include in Cargo.toml

Publish from a clean git repo

Keep an eye on issues

Related topics

Use `include` in `Cargo.toml`