Skip to content
This repository was archived by the owner on May 27, 2024. It is now read-only.

Better handle licenses that have SPDX templates/variables (e.g. MIT, BSD) #16

Closed
silverhook opened this issue Jan 23, 2019 · 14 comments
Closed

Comments

@silverhook
Copy link
Collaborator

Currently the 2.0 spec asks for licenses which include the copyright statement in their text to:

  • have LicenseRef- prepended
  • have -<copyright_owner> appended

This I suspect is in order to store the license texts as they were found (i.e. with different copyright holders in them), to distinct them from each other and not to clash with SDPX spec.

Now while the SPDX spec on the hand does say that non-standard licenses should be prepended with LicenseRef-, its standard license texts are also in form of a template, so if the variable strings (e.g. the copyright statement and copyright holder in BSD-3-Clause is modified), it would still match and be a valid (e.g. BSD-3-Clause) SPDX license.

But if we use the License-Ref prefix it prejudices that the license is not on SPDX’ list, which is not true.

One option could be that we handle the MIT/BSD/… special case as is suggested in https://git.fsfe.org/reuse/included-hello/issues/1 – i.e. to also add a Valid-License-Identifier: pointing to a valid SPDX license to the license text itself, so e.g. the headers in the code should read:

// SPDX-License-Identifier: LicenseRef-MIT-Microsoft

and the header of the license text in LICENSES/LicenseRef-MIT-Microsoft.txt should read:

Valid-License-Identifier: LicenseRef-MIT-Microsoft
Valid-License-Identifier: MIT
License-Text:

Another option would be to simply state MIT (or BSD-3-Clause) in the licensing header and rely on the copyright notice together with the SDPX ID to be the differentiator for the license text, so e.g.:

the following copyright notice header:

// © 2014 CompanyX and ProjetZ contributors
//
// SPDX-License-Identifier: MIT

would simply be linked to the LICENSES/LicenseRef-MIT-CompanyX-and-ProjectZ-contributors.txt license text file, which would include the MIT license text with the copyright holder/statement corrected and would not need its own licensing text header, as the name of the license file is auto-generated (and can be followed back) by combining the SPDX ID with the copyright holder string.

With both of these options either a person or a machine could follow from the the source code’s header (to the actual license text file and then) to the SPDX ID of the license.

A third option could be to just say all of these fall within the scope of the template and not bother with the details. But that is probably the least favourable option.

In any case, it would make sense to reach out to SPDX and work on this together. I feel the solution is just within reach, but we need to agree on it.

@pombredanne
Copy link

@silverhook IMHO I think the copyright is not part of the license terms hence a mere copyright statement change does not trigger the creation of a new id or LicenseRef. That's always been the intent with SPDX alright.
There are rare cases where a copyright is part of the license text would be as in the GPL text where the FSF copyright is part of the ext and applies to the license text and not the license. In other cases, these would be one-off, non reusable licenses IMHO.

The thing that makes things confusing has been the inclusion of template/placeholders for a copyright statement in SPDX texts (and in some cases in OSI texts too). That's IMHO to provide guidance and nothing else.

@silverhook
Copy link
Collaborator Author

The thing that makes things confusing has been the inclusion of template/placeholders for a copyright statement in SPDX texts (and in some cases in OSI texts too). That's IMHO to provide guidance and nothing else.

I agree, but the problem is that the copyright notice currently is part of the SPDX templates. So either we get rid of it in the SPDX templates, or we have to find a way to deal with it.

@pombredanne
Copy link

@silverhook

the problem is that the copyright notice currently is part of the SPDX templates

This is a primeval flaw alright. The simple solution is to get rid of them if you want to have canonical reusable license texts. That's what we do on ScanCode for SPDX-listed and non-SPDX licenses.

@silverhook
Copy link
Collaborator Author

Right, but in that case you need to make an exception to using SPDX texts for those kinds of licenses.

Options I see are still the same as I mention above.

Revelant: spdx/license-list-XML#523

P.S. @carmenbianca, @mxmehl: it seems the included-hello example, together with https://git.fsfe.org/reuse/included-hello/issues/1 has been taken down. Where has it moved to?

@mxmehl
Copy link
Member

mxmehl commented May 16, 2019

P.S. @carmenbianca, @mxmehl: it seems the included-hello example, together with https://git.fsfe.org/reuse/included-hello/issues/1 has been taken down. Where has it moved to?

The back-then-beta archiving function of our gitea triggered a strange edge case which rendered the repo unusable. To be honest I didn't think of the issues and deleted the repo altogether.

If it is of large importance to you, I can try a restore. Just tell me, but it could take some days before I find time.

@silverhook
Copy link
Collaborator Author

@mxmehl, it would help. But I can also try to remember what I wrote there. The issue hasn’t changed. But it will take me time to remember and write it down again.

@carmenbianca
Copy link
Member

A fourth option perhaps:

Include MIT.txt from https://github.com/spdx/license-list-data:

MIT License

Copyright (c) <year> <copyright holders>

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is furnished
to do so, subject to the following conditions:

The above copyright notice and this permission notice (including the next
paragraph) shall be included in all copies or substantial portions of the
Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS
FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS
OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF
OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

... and then provide a script that fills in the two variables (year, copyright holders) for all copyright holders found within the project.

Some problems:

  • The resulting license file may not 100% match the original file due to an error in translation.

  • It's not easy to interpret what is a year, a name or an address in copyright notices, so the process of extracting this information from the project would involve a lot of heuristics.

  • Someone has to actually execute that script. The person who creates the tarball? The person who receives the tarball? Nobody, because who's going to care anyway?

@silverhook
Copy link
Collaborator Author

silverhook commented May 31, 2019

@carmenbianca, that is also not a bad idea. But I see (some) of the issues with it differently than you.

The SPDX License List in its source XML stores the licenses in templates, where appropriate.

Optional parts are enclosed in <optional> tag, so the license text would officially match with its SPDX equivalent whether that text is there or omitted.

Parts where different strings are valid are enclosed in <alt> tag, which includes further information and matching rules (e.g. <alt match=".+" name="copyright">).

If you look e.g. at MIT License (and its XML source), the following would still be an SPDX valid MIT License:

MIT License

Copyright (c) 2019-far_far_futar Code Monkey <[email protected]>

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and 
associated documentation files (the "Software"), to deal in the Software without restriction, including 
without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 
copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to 
the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial
portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS 
FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS 
OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, 
WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN 
CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

As you can see, I entered a funky copyright statement and removed the optional text (including the next paragraph), but it should still qualilfy as SPDX-License-Idenfitier: MIT.

So the matching is not really the problem.

The problem arises though, when there are several copyright holders who (want to) have their copyright statements in the license text itself – e.g. if the project/package includes MIT-licensed code of different origins. In that case, you may have a naming clash for all the License Files that are actually valid SPDX-License-Identifier: MIT, but differ only in the copyright holder. E.g., if you already have a file in LICENSES/MIT.txt, how do you name the next License File that differs only in the copyright holder line? LICENSES/MIT.txt.1, LICENSES/MIT.txt.some_other_©_holder, LICENSES/MIT.some_other_©_holder.txt, or LICENSES/LicenseRef-MIT-some_other_©_holder.txt, or what?

If we end up with still having to juggle several different MIT (or BSD, …) License Files only because of a different copyright holder, it might make sense to select a separator that’s not used in SPDX License Identifiers. That way tools can rely on where in the file name the SPDX License Identifier ends and the copyriight holder (or other) variation starts.

The most elegant solution, as @pombredanne indicated, would be to kick out the copyright statement from the license texts in SPDX templates. Or at least make the copyright statement as an optional part of the license text (preferably one that’s off by default, in the plain text version).

Pulling in @kestewart, @jlovejoy, @goneall for ideas.

@goneall
Copy link

goneall commented Jun 2, 2019

I just need a bit more context - is the issue related to matching or to capturing the notice text for a disclosure (or both)?

There is an issue I run across rather frequently where some organization I work with prefer to identify each BSD or MIT license with a different copyright as a different license. This is due to a process they use to capture the notice text for disclosures even though the license terms and obligations are the same within a particular license family.

In terms of matching the copyrightText, the guidelines mark this as replaceable (or variable) text, but not optional.

There is a copyright element in the licenseXML schema, but it currently is not being translated to alt or optional template fields by the tool. It would be a good improvement to add this to the publisher tool.
I added spdx/LicenseListPublisher#48 to track.

The license XML for MIT already include alt text after copyright (c) but does not enclose the word copyright with alt - so the work copyright must be included for a match.

The BSD 3-clause does not have any alt or optional elements but it does enclose the copyright in a copyrightText element.

@carmenbianca
Copy link
Member

There is an issue I run across rather frequently where some organization I work with prefer to identify each BSD or MIT license with a different copyright as a different license

@goneall This is the problem. MIT (and BSD) have a clause saying that the copyright notice must be included in copies of the software. But if I copy MIT.txt from spdxx/license-list-data, then obviously the copyright notice is not included. The question is how the user can comply:

  • Can you copy the copyright notice into a comment header, and consider yourself compliant?

  • Should you just copy the entire license text and create a new license LicenseRef-MIT-Alice?

  • Is there another way around this?

The problem with option number 2 is that it is extraordinarily laborious. You have to create all these separate licenses and remember to link to the correct LicenseRef-MIT-* instead of MIT in your comment headers, for every instance of MIT code you have.

The spec that is currently online at https://reuse.software/ but is due for replacement mandates the second option. #23 gets rid of this recommendation and does not mention this problem at all, which officially makes it Not Our Problem™. But it would still be good to have a solid recommendation for the FAQ, because people are invariably going to run into this.

@zvr
Copy link

zvr commented Jun 2, 2019

The SPDX document for the file shroud have the field FileCopyrightText which accurately records the copyright notice for the file.

@goneall
Copy link

goneall commented Jun 2, 2019

@carmenbianca Thanks for the context. I haven't found a perfection solution for the attribution generation, but I'll include a couple opinions below FWIW.

For matching using the SPDX license matching guidelines and associated tools, the copyright will be ignored either because it is currently wrapped in an <alt or <optional element or it is wrapped in a <copyrightText element. The <copyrightText will automatically generate template code which will ignore the full copyright text when we publish the next release of the license list.

In terms of capturing the information for attribution, my favorite approach would be to maintain a complete SPDX document which includes FileNotice and FileCopyrightText tags along with using the standard SPDX license ID's. If maintaining an SPDX document isn't well supported by the build tools, then perhaps using the proposed SPDX copyright text tags in the source files would be a good alternative. My only hesitation is I don't know if this would meet the legal requirements of including the notice. That would be a good question for the lawyers. If the legal consensus is we need the full license text, then we should go down the path of storing the full notice either in the code or in a separate file as discussed above.

@silverhook
Copy link
Collaborator Author

This issue should be (all but) solved upstream when @goneall updates the license list publishing system (see spdx/license-list-XML#866 (comment))

@mxmehl
Copy link
Member

mxmehl commented Jul 20, 2020

Closed in favour of #48

@mxmehl mxmehl closed this as completed Jul 20, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants