Missing good tools for bundling third party licenses

When using Rust to create binary applications (e.g. *.exe files for Windows), I might have to comply with a number of third party licenses.

Luckily, many crates are published under liberal (non-viral) licenses and there already exists cargo tree -f "{p} {l}", which lists all dependencies including their license according to the SPDX license identifier.

There is also the cargo-deny crate, which allows you to automatically check that all licenses are either whitelisted or approved by the FSF or OSI (whichever you want to require). :+1:

However, even if many used licenses are pretty liberal, they still require attribution in regard to the author(s)/copyright holder(s). When I distribute something that includes 3rd party code (such as a binary), I have to ensure that all relevant licenses are bundled with my product. At the same time, I don't want to include licenses or authorship information for components that are not included in the final product (e.g. build dependencies or dependencies for other platforms).

I have found the following tools:

But testing these, I was greatly disappointed. For example, cargo-about seems to create different output depending on how I select a "threshold" for "detecting" licenses. I don't want to detect which license is used (I have cargo deny to do that), I simply want to copy all licenses and authorship information into a folder. The other choice, cargo-bundle-licenses, doesn't seem to be configurable in regard to the build target, and I don't like the output format options.

Which other possibilities do I have? I ended up writing a quick-and-dirty script, which parses the output of

cargo tree --target=x86_64-pc-windows-msvc -e normal --prefix none

and then copies all LICENSE*, License*, license*, COPYING* files from ~/.cargo/registry/src/index.crates.io-…/{package}-{version}/. I then have to manually check if some packages have no license file and decide on a case-by-case basis what to do.

It's not too much work to do that once, but it is somewhat painful when updating dependencies on a reguar basis in the future. :frowning_face:

Is there any better option?

Why doesn't crates.io require specifying a license file in addition to the SPDX identifier? For me, as a package creator, cargo even discourages me from specifying the license file in the manifest:

warning: only one of license or license-file is necessary

license should be used if the package license can be expressed with a standard SPDX expression.

license-file should be used if the package uses a non-standard license.
See https://doc.rust-lang.org/cargo/reference/manifest.html#the-license-and-license-file-fields for more information.

I feel like it's a design flaw to not support (or even require) a more machine-processable way of license management / attribution when it comes to the particular license texts (i.e. with word-by-word correctness including the copyright notices).

5 Likes

I agree the tooling is lacking. If I didn't already maintain too many crates, I'd write a tool for this :slight_smile:

Because the SPDX identifier already tells you in an unambiguous machine-readable way what the license is, so you don't need the file in the metadata as the source of truth. There are templates for all SPDX licenses, so if you need to include the license text, you can (re)generate one from the template. For projects that use multiple OR licenses, the machine-readability of SPDX also allows you to only include licenses you've chosen, not all of them.

With a file you run into problem of having to identify whether differences between the file and the standard license template are just the template filled in (with things like name and year), or whether the changes alter the terms of the license. This isn't machine-readable, and heuristics create risk.

So the license-file field is not for a file that is an equivalent of the SPDX license identifier, because that's just a redundant worse version of the data you already have. It's only a necessary fallback for licenses that can't be expressed as SPDX.

Having said that, most open-source licenses actually require distributing a copy, so you still need to include the license text somewhere. But I think it should be treated as a derived file — take Cargo.toml, read license, and write COPYING boilerplate based on that. Not the other way.

2 Likes

Note that the Apache license, which many Rust crates use, requires you to include a copy of everything in the NOTICES file, if present, in addition to the license text.

1 Like

First of all, many thanks for your responses!

I don't need to be able to automatically find out what type of license is used.[1] In theory, I have the SPDX identifier, which I may trust to warn me if I use a bad license.

With this post, I wanted to address a different problem. As you say…

…SPDX licenses are "templates". Thus they will differ in wording. And the licenses usually require the licenses to be shipped with distributions of the software:

"The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software."

(in case of MIT), or (in case of Apache 2.0)

"You must give any other recipients of the Work or Derivative Works a copy of this License; […]"

and

"If the Work includes a “NOTICE” text file as part of its distribution, then any Derivative Works that You distribute must include a readable copy of the attribution notices contained within such NOTICE file, […]"

Many thanks to @2e71828 for NOTICEing that!

I'm NotALawyer, but I don't think shipping a different (re-generated) license will fulfill the license(s) requirements. If the license is a word-by-word equivalent, then yes. But in case of the heavily used MIT-licenses, the copyright notice (along with the copyright holder) is part of the notice:

"The above copyright notice […] shall be included […]"

Regenerating the text from scratch would specifically violate this (very liberal) license. I will need to copy a LICENSE file which contain these copyright notices (and/or a NOTICE file in case of Apache 2.0 licensing) in an automated way.

While I agree that the license-file field wouldn't be sufficient (e.g. due to the clause in the Apache License to include a NOTICE file, if existent), it is not redundant. Like you said, licenses are templates, and removing the actual license text and just keeping the SPDX identifier would result in a loss of information. And that information loss is the critical part which would keep me from following the license requirements (shipping copies of the actual real license text, which may differ as certain text is filled into these templates).

What I feel is needed in a crate's metadata is a required and exhaustive list of files to be shipped when redistributing the software. Of course this list could be wrong, but then such a crate could be marked as errorneous in a database (just like a vulnerability, because it does sort-of mislead you in a dangerous way).

Why do I think there is a demand for this? It's idiomatic to split up functionality into many (small) crates. I can easily have hundreds or thousands of dependencies (or even more?) when using Rust. This makes Rust unsuitable to create and ship binaries or packages which include binaries, because it's practically impossible to follow up even very libraral licenses (which allow binary distribution) because the overhead of tracking every package's license requirement and manually verifying that all info is correct is just very hard for a single maintainer, especially when wanting to provide regular updates.


  1. well, I would like to verify it automatically, but that's a different problem to address ↩︎

3 Likes

Opposed to the original topic of this thread, I don't believe anymore that a proper solution would be a tool. In fact, it's the Cargo.toml metadata which needs to be extended, I think, in order to solve this cleanly.

I opened an issue on this:

3 Likes

Perhaps there should be a field for the copyright lines specifically? Debian's deb format has this.

Because the rest of the license text must be equivalent by definition of the SPDX identifier. If there's any meaningful change (e.g. someone added a sentence with an extra requirement or inserted "not" somewhere), then it's not the license any more, and the license field would be disagreeing with the license file.

I think you are wrong. (Edit, sorry if this came off as personal: I think that's not correct.) It's not only certain "lines". Often parts of the text of the license itself are replaced with the actual copyright holders or the product name. See the red parts here, for example:

this software and associated documentation files

could be replaced with a package or software name, and

THE AUTHORS OR COPYRIGHT HOLDERS

could be replaced by the actual copyright holders, e.g. a Foundation's name.

If a crate publisher legitimately modifies these parts, I need to ship this modified version with my bundle. I can't recreate a license file from scatch. That would violate the license terms.

See also SPDX Annex B License matching guidelines and templates (Informative), in particular subsection B.3.4.


A particular example can be found under SPDX identifier MIT-modern-variant where already the replacable text in the template(!) contains the university's name:

IN NO EVENT SHALL THE UNIVERSITY OF CALIFORNIA BE LIABLE TO ANY […]

If I were to use that "modern variant" of the license, I would of course have to replace the name of the university with my own organization's name.


Of course, but dealing with malicious packages is another problem that doesn't need to be resolved by the mechanism that helps me dealing with benign crates. Such crates which insert a "not" into the license text (where the "not" really means negation of certain rights) while using a wrong/misleading entry in the license metadata field would hopefully be banned from crates.io when this is discovered and made well-known.

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.