Validating if packaged crate content matches VCS

With the recent discussions about crate security, I thought it would be interesting to build a tool to validate crate code.

Typically when reviewing a crate for functionality and security, I read the code published at Github (or other VCS system). But I believe there is currently no guarantee that the code in the VCS matches the published crate. Malicious code could be injected anywhere (malicious author, hijacked cargo bin, crates.io hack, S3 hack).

My proof of concept is pretty simple, it looks like this

âžś  ./run.sh serde = "1.0.130"
Validating serde at version 1.0.130
Downloading https://crates.io/api/v1/crates/serde/1.0.130/download
Unpacking to ./crate
Version 1.0.130 commit hash is 65e1a50749938612cfbdb69b57fc4cf249f87149
Repo URL is "https://github.com/serde-rs/serde"
Github source is git@github.com:serde-rs/serde.git, checking out..
Crate located in subdir ./serde

############################
# VALIDATING REPO CONTENTS #
############################

Ignoring Cargo.toml (TODO)
matching ./serde/LICENSE-APACHE
matching ./serde/build.rs
matching ./serde/README.md
Ignoring .cargo_vcs_info.json
matching ./serde/crates-io.md
matching ./serde/Cargo.toml.orig
matching ./serde/LICENSE-MIT
matching ./serde/src/std_error.rs
matching ./serde/src/lib.rs
matching ./serde/src/private/ser.rs
matching ./serde/src/private/de.rs
matching ./serde/src/private/mod.rs
matching ./serde/src/private/doc.rs
matching ./serde/src/private/size_hint.rs
matching ./serde/src/integer128.rs
matching ./serde/src/de/utf8.rs
matching ./serde/src/de/mod.rs
matching ./serde/src/de/impls.rs
matching ./serde/src/de/value.rs
matching ./serde/src/de/ignored_any.rs
matching ./serde/src/de/seed.rs
matching ./serde/src/ser/fmt.rs
matching ./serde/src/ser/mod.rs
matching ./serde/src/ser/impls.rs
matching ./serde/src/ser/impossible.rs
matching ./serde/src/macros.rs

No code injections found

Before I continue and start running at scale, I would like to know if my assumptions are correct and if this would be useful to anyone? Any suggestions welcome!

1 Like

Someone recently looked at this issue at Code and Bitters. The code used for their analysis is at GitHub - ericseppanen/crates_io_analyzer: Investigating published Rust crates which might be helpful.

Not quite the same goal, but I've been thinking that it would be useful for reviews if there was a tool for easily diffing published crate versions against each other. This may give larger diffs to review than the commits of a source repository, but it has the advantage of looking directly at the code you're thinking of using, independent of any other sources. After all, crates.io is a version control repository — a coarse-revisioned one, but also the one that truly matters to Cargo.

3 Likes

cargo-crev can diff published crate versions, with the cargo crev crate diff and cargo crev review --diff commands.

9 Likes

Oh, that's good! None of the introductory material to crev that I've read mentioned that that existed, as far as I noticed. It seemed so important but I didn't get the impression that there was any support for it, only the message “read the files (of a single version) with whatever tools of your own choice”.

Thanks! This is literally what I intended to do, saves me a lot of time...
I may reach out to the author to set up some continuous scanning together

1 Like

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.