`cargo tree` for license expression generation

I'm curious about the level of "processing" cargo tree does in producing its output, specifically when using --format "{l}" to dump license expressions.

It's no surprise to most that there are challenges in packaging compiled rust binaries for OSS Linux distributions. One of the biggest is working out the combined set of licenses that cover the package, effectively a combination of the licenses for every crate that goes into the final build.

cargo tree --format "{l}" certainly has the potential to assist in compiling that data, with the correct options. So far, it's been working pretty well to extract a flat list using a command along these lines:

cargo tree --workspace --edges no-build,no-dev,no-proc-macro \
--no-dedupe --target all --prefix none --format "{l}" \
| sed -e "s: / :/:g" -e "s:/: OR :g" | sort -u

(The sed is because some crates list their license options like "MIT/Apache 2.0".)

My question is: Do we really need the --no-dedupe in there? It's there, to the best of my knowledge, on the assumption that the deduplication may lose information — if a certain crate has different features enabled each time it's consumed, that would affect either the license displayed or the list of its own dependencies (and therefore their set of licenses).

But, can that actually happen? Is the {l} output customized each time it's output?

If it's just whatever is listed after license = in each crate's Cargo.toml, then it'll be the same for that crate each time the crate is listed.

And if features can alter the list of dependencies for a crate (I'm not sure whether that actually is possible), then would feature unification also ensure that the dependencies listed are the union of all of the dependencies needed for all features, as well?

Unless there is a possibility for information to be concealed, then it doesn't seem like the --no-dedupe is really accomplishing anything.

I'm pretty sure that --format {l} just shows the package.license key from packages' Cargo.toml files, but I don't have a reference to prove that.

Note that it's technically possible that enabling a feature of a crate will cause the crate's license [1] to change, which would happen if the code for the feature is under a difference license than the rest of the crate. Cargo's license metadata is not capable of describing situations like this, though.

Yes, that is possible. See this section of the Cargo Book.

Yes.


  1. Specifically, what the license should be if it were expressed properly, not the license as reported by cargo tree or listed in package.license. â†Šī¸Ž

1 Like

Thanks, @jwodder !

Agreed, on both counts. I have no doubt that the strictly-correct license expression for a crate "should" change based on the set of dependencies it's pulling in (as customized by the feature set enabled). But if cargo tree can neither capture nor express those nuances, with or without --no-dedupe, then it sounds like we're really not buying ourselves anything by dumping out the exhaustive list.

(We'll just need to add another operation to the sed call, -e 's: (\*)$::', to eliminate the deduplication markers on the lines where they appear.)

I'm not entirely sure what you are trying to do, but it looks like you want to generate a file with all licenses for a binary crate?

Have you looked at GitHub - EmbarkStudios/cargo-about: 📜 Cargo plugin to generate list of all licenses for a crate đŸĻ€?

I believe there are some other tools as well that do similar jobs, but this is the one I have used.