Beyond SPDX: expanding licenses identified by ClearlyDefined

ClearlyDefined is an Open Source project that helps organizations with supply chain compliance. Until recently, ClearlyDefined’s tooling only supported licenses that were part of the standardized SPDX license list. Any component identified by a license that was not part of this list resulted in NOASSERTION, which introduced uncertainty about the permissible use of such component, potentially hindering collaboration, creating legal complexities and security concerns for developers.

Fortunately, Scancode, which is an integral part of how ClearlyDefined detects and normalizes origin, dependencies and licensing metadata of components, already supports non-SPDX licenses thanks to its use of LicenseDB. LicenseDB is the largest free and open database of software licenses, in particular all the Open Source software licenses, with over 2000 community curated licenses texts and their metadata.

Philippe Ombredanne, the leading author of Scancode and LicenseDB, defended ClearlyDefined leveraging this capability already provided by Scancode:

As one of many examples, common public domain dedications are not tracked nor supported by SPDX and are not approved as OSI licenses. Not a single lawyer I know is treating these as proprietary licenses. They are carefully cataloged and properly detected by ScanCode (at least 850+ variants of these at last count plus an infinity of variations detected approximately)…

Collecting data is not endorsing nor promoting anything in particular be it proprietary, open source, free software, source available or else. But rather, just accepting that the world of actual licenses is what it is in all its glorious messy diversity and capturing what these licenses are, without discarding valuable information detected by ScanCode. Discarding and losing data has been the problem until now and has been making ClearlyDefined data mostly harmless and useless at scale as you get better and more information out of a straight ScanCode scan.

You are welcome to use anything you like, but I think it would be better to adopt the de-facto industry standard of ScanCode license data, rather than to reinvent the wheel, especially since ClearlyDefined is kinda using ScanCode rather heavily.

We use a suffix as LicenseRef-scancode in https://scancode-licensedb.aboutcode.org/ and guarantee stability of these with the track record to prove this.

After a healthy discussion on the topic, the ClearlyDefined community agreed that supporting non-SPDX licenses was important. Scancode already provides this functionality and it offers mapping from these non-SPDX licenses to the SPDX LicenseRef. Organizations using ClearlyDefined now have the option to decide how to handle non-SPDX licenses based on their own needs. This work to have ClearlyDefined use the latest version of Scancode and support non-SPDX licenses was led by Lukas Spieß from GitHub with the stewardship from Qing Tomlinson (from SAP) and E. Lynette Rayle (also from GitHub). We would like to thank them and all those involved in the development and testing of this implementation.

We are looking for feedback. Please test this feature on dev.clearlydefined.io or dev-api.clearlydefined.io and file any issues here.