Idea for scalable code review/trust system


#1

Edit: I have created project github page https://github.com/dpc/crev, which contains much more up to date information, and will evolve with time.

Below is the original post, that is no longer accurate, though the core idea is the same. Feel free to keep discussing and giving feedback in this thread.

I was toying with an idea on how to make code reviews in the community scalable. Basically - it bothers me that I’m pulling hundreds of dependencies, and I have no time to review all of them.

This is not a Rust-exclusive problem, but if I am to try to implement it, it would be written in Rust and targeting Rust, at least initially, so I would like to run it through other people, and see if they think it even makes sense.

I wrote down some notes when thinking about it, and I’m pasting them as is below. Feel free to comment.

crev - A scalable code review system for Open Source projects

The code reviews would get signed and added to source code repository itself as “review proofs”.
They should be easy to manipulate: paste, move, deleted and submitted as PRs.

Workflow:

  • someone reviews the project/parts of it,
  • generates review comments
  • submits a PR
  • code review is merged and included with the source code

Eventually, tooling could use it to verify trust in dependencies, etc.

Some minimal level of support different comment syntax might be needed.

TODO:

  • ask for opinions

Review proof format

Each review would append a new review proof in a file. Eg. for src/main.rs, a file ./.crev/src/main.rs.
If file didn’t exist, it would be created.

TODO: Consider other path schemes. Eg. what if file gets renamed.

Example format of each review proof:

date:  2018-08-01 22:43:39-07:00 
author: Dawid Ciężarkiewicz  
author-id: salty 2ZuU37oJ1erD85AzVohXq6Y74GHv2hjNYB9fu3P5o9rsGSvRo19HK2wTL4MLma3N6gVFqXN81VTqQ6apBhc5Kezq
scope: thoroughness=good; understanding=good
revision: bd049182c8a02c11b4bde3c55335c8653bae7e2e
hash: sha-256 2cff6162e5784b263c6755b6d8b5a7933064956701008060fb47c24b06d630ee
signature: 5V1c1P5a8dqDVMPhwqnDF39ZrHpaw7jhetEgHyPUkjM8tYvugPzDJ3xyhD9WdJQ4AjwYkN2XdWhnTB3GTRMJuAEd

Some explanations:

  • revision - Revision system id (eg. git commit)
  • hash - Hash of the file (type and digest)

scope

TODO: Rename?

This can be added with time, as an extension to specify details about the review

  • thoroughness
    • none (seconds to minutes)
    • some (minutes to hours)
    • good (hours to days)
    • ultimate (days to weeks)
  • understanding:
    • i don’t understand (none)
    • i understand some (some)
    • good
    • ultimately (ultimate)
  • trust:
    • i distrust completely
    • i distrust
    • i trust
    • i trust completely
  • authorship:
    • i didn’t write
    • i wrote some
    • i wrote most
    • I wrote all

Shortest version: “thoroughness” + “understanding”

Signature/verification algorithm:

  • identify a block (series of key-values, from date to signature.*
  • remove comment characters, trim
  • remove signature.* headers
  • sort
  • sign that
  • signature.* headers are for signatures and extensions (eg signatures of ids, for inline WoT)

Signing possibilities:

Commands

crev id

Prints out the id

crev id --create

  • Ask for passphrase
  • Write the file with:
    • wrapped randomly generated secret
    • id (public key-like)

crev sign [file ...]

Ask about the level.

If no file, repeat for every file.

For each file:

Then just generate the signature and add to the file.

crev status [file ...]

If no file, repeat for every file.

For each file:

  • Find each review
  • Match against known ids:
    • Id store per-project (checked in the revision system)
    • Id store per-language (eg. Rust lang team, possibly updated periodically)
    • Id store per-user (~/.config/crev/tursted-ids, etc)
    • Print and ignore unknown ids
  • verify signatures - exit if wrong
  • find newest review
  • check if fresh (git diff f53b6cfda77460c0d3e71208ff211fcbe2078ea5..HEAD -- Cargo.lock)

Possible file statuses:

  • not reviewed
  • stale
  • fresh

Potential options:

  • minimum: length, trust-level, authorship level, etc.
  • ignore per-project trusted-ids store
  • --summary - minimum possible level

review diff [file ...]

Just like status but print the diff


Effect system for panics and transitive unsafe
#2

I like the idea, though it’s hard to mash “scalable” and “human workers” together (mechanical turk?). The other part is how do we know whether to trust the review which says “I trust this code”.

For the latter issue, having multiple reviewers will ease the issue (until you get multiple malicious "reviewer"s). That said, how likely is anyone to spend time to submit false reviews? Hopefully not at all.


Raises the questions, how do you link “project” with “crate”, or how do you relate “I reviewed this feature which spans a cross section of modules from these crates” to a score? Eh sorry this makes the review -> score mapping sound hard :stuck_out_tongue:

A coverage scale rather than a time scale probably works better (e.g. I reviewed RustC for 1 year, I don’t understand and came out more confused)


A tangent (maybe for a separate topic), but still in the realm of scaling “can I trust this dependency”, having good code coverage gives some confidence to the quality of the code, and if we were to extend that quality into trust, it would be “have an external suite of tests which run against the crate”, i.e. external automated specification tests.


#3

I’m very interested in this! I’ve almost started writing the same thing myself.

Rust code is distributed as crate tarballs, which are not connected to any repository. Verification has to happen on either the crate file (as a whole, before extraction) or files in the crate file (extracted from the tarball). Putting signatures in the repo seems to complicate this, as you’d have to search the repo — including its history — to find signature. Since signatures are per file, then you can’t catch unauthorized deletion of a file. I guess verification tool would have to check that all files refer to the same repo hash (and verify the hash against the repo?) to prevent signatures from different crate versions being combined together to create something new.

I expect there would need to be possibility of having multiple reviewers for each crate. I may not trust crate’s author. There may be companies that insist on reviewing everything themselves. Some may prefer to have a second pair of eyes look through the code.

So the system as a whole should support multiple sources of reviews, and probably aggregation of reviews from various sources. For example, Mozilla may want to review some crates they use in Firefox and publish that. Another company may review crates they use. It’d be great if some third party could collect all these reviews together (coupled with some Web of Trust) to increase review coverage.

And it’s probably worth thinking about negative reviews. If I review something as “this is a virus!”, then the author of the crate shouldn’t be able to suppress it (by rejecting PR that adds bad .review to the crate). OTOH the system must be able to deal with malicious reviews (if I falsely accuse my competitor’s crate of being malicious, my reviews shouldn’t be trusted any more).

“Diff” should be a recognized type of review. When there’s a new version of a crate with 1MM lines of code, I won’t be able to re-review 1MM lines of code, but I may review parts that were changed. This way if you trust (anyone else) reviewing previous version, and you trust me checking the diff, you can trust the new version too.


#4

Well, it’s as scalable in a sense that code review proof can be shared with the rest of the community.

I did not went into design, but I’m planing to add a decentralized web of trust. So just because someone random reviewed a given code, doesn’t change much. Only after that reviewers id gets trusted (signed) and included on personal/per-project/per-langauge trust lists, it will start counting towards trust.

The plan is for reviewes to be per-file. That’s the natural unit: it a good balance between coarse/fine-grined, it allows cryptographic hash of content, it can be copied between repositories etc.

The coverage is handled by beinjg per-file. If you reviewed 5 files, only these 5 files would get your proof of review. At very least, that would give other people view into what haven’t been reviewed at all.

I doubt you could do very thoroug review of the whole codebase, so your review would be in some files of the follwoing form:

scope: thoroughness=some; understanding=some

You’ve spent between minutes to an hour per-file, and have some (limited) understanding.


#5

I think the better approach will be to follow cargo audit. In other words to write cargo plugin which will be able to fetch crate from crates.io and will allow to publish signed review which will include reviewed crate hash and some additional information to help with automated processing. (maybe with some functionality which will allow to say, “this version didn’t changed much, see review for this version”) Users will be able to use the same plugin to inspect reviews of their dependencies. To start things off reviews database can be stored in github repository a-la advisory-db. We could even use PGP keys provided in github accounts to verify signatures.

Also I don’t see enough motivation for per-file reviews, to make things significantly easier I think it’s better to use crates as a reviewed unit.

More complicated functionality can be built using this as a foundation.


#6

For the interface I imagine it’d be best to make a Cargo repository that only exposes “trusted” crates.

  1. I’d configure who I trust
  2. I’d configure how much risk I can tolerate (quality of review required, number of reviewers, exceptions allowed)
  3. The repository would only expose crates I trust in its version of the index built for me
  4. cargo install would be able to only install trusted crates

#7

I think your points are better solved with something like TUF. While reviews and crate signing are somewhat related topics and probably can be integrated into a single framework, I think it’s better to keep scopes separated for early stages of prototyping.


#8

Please excuse my brevity. I’ll have more time for discussion in the next few days.

IMO:

  • If this system is to gain any traction, it has to be language-agnostic and thus tied to the source code itself, and not packaging tools.
  • The review should tied to the source code - code often moves between repos, crates get split, files copied etc. That’s why I decided to make the file a natural coersness level and file checksum.

I don’t think it’s a big deal. I am trying to address code review (as in: "I’ve looked at this code, checked for harmful code, judged quality, etc.) and not integrity/authenticity of crates. Integrity of crates coudl be addressed better at different levels, and I’m quite content with how cargo/crates.io handles it already.

Reviewing per-crate doesn’t scale that well. Quite often only handful of files will change between releases, and being able to reuse code reviews of the files that didn’t is important. Also - I don’t expect many crates to have 100% code reviewed / people doing a full overal review of the whole crate

Yes. I am thinking about it as well, comibined with an outside sharing of code reviews (so authors of code can’t get away with not including bad reviews). Though honest authors could include review score that is “(i wrote it, i undersdtand it, and I know it’s buggy”)

Interesting idea. As I believe the review should be per-file (which shouldn’t be 1MM lines in the first place), I guess it’s not as important, but maybe there is some applicability of it too. Thanks! I’ll think about it.


#9

TUF doesn’t solve the problems that I want to solve. For example, TUF trusts the author and doesn’t protect against the author going rogue. There have been cases where authors of Chrome extensions have accepted “monetization” deals and started injecting malware into their code. I would like independent reviews to catch that, and AFAIK this just doesn’t fit into TUF’s model.


#10

I believe TUF can be fairly easy extended with reviews functionality. Though it’s better to ask @bascule about it.


#11

I think integrity is crucial for reviews, otherwise weaknesses in the integrity can become weaknesses in the review system.

For example if you allow mixing file versions to increase coverage, there could be situation: In version 1 there’s a security check in file A, that file B relies on. In version 2 the check is moved to file B. Attacker can combine A-v2 and B-v1 into a “reviewed” crate that has no security check.


#12

That’s true, but I would rather add an additional crate/repository level review or even leave it for other tools, than not have reusability and efficiency of per-file reviews.

I’m looking for “git for code review” tool - social, decentralized, flexible. hackable. sitting next to the source code itself. In a similar spirit that artifact is a tool for design documentation. I am not necessarily trying to address code integrity and rouge author problem - that could be addressed by something higher-level, possibly using existing code review proofs to speed up the final “crate-level release-level” trust.


#13

TBH, this sounds antinomic with “own simple system (preferred), similar to https://github.com/carlos8f/salty” to me.

If going this way, please don’t make me have to maintain two different WoT, especially one with a new tool that would be unlikely to handle all cases I would want it to handle. Especially now that sequoia-pgp is there for a sane-defaults-and-easy-to-use-from-Rust implementation of OpenPGP :slight_smile:

Apart from that, it looks really cool! Though others have already pointed to potential issues about the reviews being per-file instead of per-crate, as there are potentially dependencies like “this file is safe iff this other file provides such and such guarantees for such function”. I’d think maybe having crate+diff reviews would be almost as scaleable (except for people who would manage to review only a file and find it trusted, but honestly I don’t get how).

In order to avoid the “this is only for Rust” issue, maybe have signature be per directory? Or even allow signatures on both files and directories, and let users do them as they prefer.

Anyway, good luck with this, I’ll be watching it!


#14

On the technical side I think SIT by @yrashk could be evolved in order to cover the use cases for Review. Throwing it in here as I don’t think many people are aware of SIT.


#15

I’ll be happy to help developing this case. Rust has a special place in my heart, and SIT is implemented in it.

Worth mentioning, I’m planning to get a new release out this month that will simplify the model by removing one abstraction (item) from the core, making it even more suitable for different applications I’m involved in. It’ll be an easy, seamless upgrade from 0.4.

I’ll dig through this thread to see if I can offer any further specific thoughts.


#16

How flexible is GPG/sequoia’s WoT? I imagine there may be domain-specific requirements, e.g. many people are extra concerned about unsafe {} blocks, so could demand higher level of trust for reviewing code with them.

I’m unsure whether being Rust-only is a good thing or not. I’m thinking of supporting Node too: one one hand that’s a huge ecosystem, so it might significantly increase interest in the project, but OTOH Node users would prefer implementation in JS, so that’s an extra work of separating protocol from implementation(s) and risk of scope creep. Being Rust-specific would allow focusing on Rust’s needs better.


#17

I am aware of salty, and I like it. Though I doubt it will ever gain any traction, since it doesn’t have a use case associated with it, so it will be hard to build a network effect around it.

I am actually planning to leave the door open for any signing system. Probably something like salty + GPG.

Native WoT don’t carry enough information. Just because you know someone PGP key is really theirs, doesn’t mean you trust their judgment, etc. That’s why native WoT of a given signing system, and WoT of code review don’t map 1:1.

I plan to make project and users to keep collecting signatures of who trusts what and whom (somewhere under $HOME/.config/ for user and along the source code itself for project), and then when analyzing dependencies, gathering them all (with the ones from dependencies) and calculating “WoT” for a given requirements ( thoroughness, etc.).

I envision that people could even collect “WoT” pieces inside of otherwise code-less repositories, just to easily include “shared, communal” WoT as part of their own projects, etc.

SIT seems very “meta”, flexible and complex. If it could handle anything like what I described, that would be great and I would consider using it. However if I am to implement it myself, I am going to target the smallest and most focused vision of “I’ve spent X time reviewing file Y, I understand it OKish and I think its OK/not OK, here is my signature” and “I’ve know Y and trust their reviews are reasonably good, here is my signature”.


#18

TBH, I didn’t know of salty until now. But I just read the first line of the description, “Salty is a nodejs-powered CLI-based alternative to PGP/GPG using NaCl instead of RSA/DSA.”

In my opinion, this is a big no-go. When the implementer of a cryptography library is mixing up a library (NaCL) with cryptographic algorithms (RSA and DSA), that’s just not a library I want to trust :slight_smile: (well, it’d maybe need the outcome of this project?) (also, PGP and GPG are pieces of software, I guess they meant OpenPGP, which is the protocol, as that’s what they should be comparing against)

A correct wording would have been “Salty is a nodejs-powered CLI-based alternative to OpenPGP using Curve25519 and Chacha20/Poly1305 instead of ???” (didn’t check which primitives from NaCl they actually use though).

I’ve put ??? though, because there’s no real thing that could be put here: the latest versions of OpenPGP allows RSA and DSA, sure, but also ElGamal, ECDSA, EdDSA and ECDH (which include Curve25519) for public/private key encryption/signature, and IDEA, 3DES, CAST5, Blowfish, AES, Twofish and Camellia (ok, so no Chacha20-line algorithms here, granted).

What I mean here, is that the very first line of the README mixes up very important things. I don’t consider myself a cryptographer enough to write a crypto library, but TBH I think I know enough about cryptography to spot this line as showing a quite important lack of knowledge in cryptography.

I’m sorry if I appear to attack the authors of salty, that’s really not my objective. Thinking one can write their own cryptographic thingies is a widespread misconception, among the first ones we learn in cryptography courses. But if one hasn’t taken said cryptography courses, it’s impossible to know it.

Also, it’s very possible that, being based on NaCl, the crypto is actually secure. But I, personally, wouldn’t bet anything on it, based on the first line of said README, and I’d rather have something made by Neal (ie. sequoia-pgp), who already made their traces in GnuPG :slight_smile:

(Also, I don’t really know whether this discussion about implementation is going off-topic or not, feel free to tell me to shut up if you think so!)

Very good point, though I think even knowing that 5 random developers have verified the code is already something that could have impact (at least for me) on deciding to use a library, even if I don’t trust them personally, so long as I know that it’s 5 actual real persons and not just 5 keypairs generated just for signing reviews :slight_smile: Hence my idea of maybe using OpenPGP to validate user keypairs, and then a separate key<->trust store for deciding the final value, with the ability to set a default trust for all keys that are valid-as-per-OpenPGP-WoT.

TBH, I don’t think trust in code verification should be transitive: because I trust someone for reviewing code doesn’t mean I trust them for guessing whether other people are good at reviewing code. That’s basically the same difference as between signatures and trust-signatures in OpenPGP parlance :slight_smile:

If you decide to go the OpenPGP route for this, you could do that with a custom signature notation for indicating the level of trust in reviewing code (don’t know whether sequoia allows easy access for that already or not, though). These custom signature notations could then be uploaded to the SKS keyservers, which would allow distribution of your code-review-specific WoT at the same time as the keys of people, avoiding the need for you to rewrite it all :slight_smile: (esp. given there are potential attacks like a key being compromised and the attacker deleting any revocation certificate uploaded that are taken care of with very strong attention on SKS, that would be very difficult to replicate on a custom-made repository)

I may sound like I advocate OpenPGP. TBH, I don’t really like it, but I know it relatively well. In my opinion, it is currently the least bad tool for this job, except if you want to re-implement everything yourself :slight_smile: (so long as it’s through a decent library, and I guess sequoia-pgp is as decent as a library can be, given the objectives it fixed itself)


#19

There could be a “fast path” for trust — if I say I completely trust a particular person, then it would be convenient if I could configure the system to automatically trust content of any git commit signed by them. This could be a way to get some “review” coverage even from sources that don’t explicitly participate in the system.


#20

I love the idea of such discovery. For example, if I’m contracting for a company, I can imagine needing a setting “for this particular project, trust sources that this company has chosen to trust” and import their WoT for their project.