-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update Licensee and Licensed gems #3982
Conversation
This pulls in Licensed 0.10.0 too.
Licensed now enforces this as it's easier then guessing.
@jonabc & @benbalter: if either of you have a few spare cycles, I could really do with a little bit of your expert knowledge on optimising our usage and getting our tests passing. At the moment, the tests are failing because Licensee isn't correctly detecting the license of quite a few grammars. Most of these grammars have BSD 3 clause licenses. If I remove the cached license and then re-add it using our Any pointers you can offer would be greatly appreciated. 🙇 |
I've been experimenting using the go-tmbundle grammar as it comes up in our failing tests. The latest Licensee run from the command line reports:
... but if you look at the license file, you'll see it's a BSD 3 Clause license and this has been detected correctly on GitHub.com |
@lildude 👋 sure I'm happy to help.
This part is a little weird to me, running Could you run me through the workflow that is failing and the workflow for updating cached license metadata? |
Sure, but this isn't really about workflow at the moment... this is about our tests verifying what we've already got. Note: all links and output are for my branch in this PR. I'm going to concentrate on just one test for the moment as I suspect fixing this will resolve one of the other failing tests too. In our tests in ... is now finding several grammars no longer have approved licenses as they're all now detected as "other" as you can see in the test failure at https://travis-ci.org/github/linguist/jobs/327212197#L1462 Sidenote: I needed to do a little tweaking of If I manually run Licensee against that same directory, it too reports "other", as I showed earlier. I see the same thing if I do it manually in
So to me this says Licensee has changed. So I thought I'd check what Licensed does, as I know it uses Licensee. I removed the cached license
So for some reason, calling Licensee directory to determine the license results in "other" whilst using Licensed results in the expected BSD 3 Clause license and I can't work out why. |
@lildude ah, I didn't realize the context was in using taking a look |
script/licensed
Outdated
@@ -40,7 +40,7 @@ OptionParser.new do |opts| | |||
end | |||
end.parse! | |||
|
|||
source = Licensed::Source::Filesystem.new(module_path || "vendor/grammars/*/", type: "grammar") | |||
source = Licensed::Source::Filesystem.new(module_path || "#{File.expand_path("../", File.dirname(__FILE__))}/vendor/grammars/*/", type: "grammar") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI in local testing, I had to remove the trailing "/" from the file path to properly find license data 🤷♂️. Otherwise my cached files looked like
---
name: <name>
type: grammars
license: none
---
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Once we sort out the Licensee part of things, I'd like to steal a bit more of your time and take you up on your suggestion of switching this to use Licensed::Source::Manifest
, but only when you've got the time - I still need to give it a go too 😄
a few things going on here: Using the If I run the same command without
Second, the reason that the file isn't being detected as @benbalter this has turned into a question of @lildude I have some ideas for how to make your situation easier but don't have more time at the moment to think through everything. will get back to you ASAP on that |
Doh!! Of course. 😊
NP. Thanks for taking the time today. |
👋 sorry for missing this @mention. As @jonabc mentioned:
That is an intended behavior. Computers aren't smart enough (yet) to parse that trailing statement and know if it's legally significant or not, so it says it's not confidence enough to call the license BSD-3-clause. The idea being, that a human can review it, and if the content is not legally significant, you can whitelist the license via licensed's config. |
Thanks for the explanation @benbalter. My main issue is this appears to be quite a significant change from the behaviour seen with the antiquated version Linguist is using at the moment, but your explanation certainly makes sense. I'll go the route of whitelisting those that are flagged. Thanks. |
We don't need smarter computers, only dumber lawyers. In fact, the entire justice system would make more sense if "Law" was a programming language that compiled down to ones and zeroes dictating if somebody receives a legal arse-kick. |
Ahh. Didn't realize you were coming from such an old version. Yes, the underlying mechanism to match licenses has improved a lot, in part, to catch, for example, if in the above example the line said "Parts of this project licensed under GPL-v3". What you're describing is an intended behavior, even if a (breaking) change from previous versions. |
Don't we already have a whitelist for these licenses? I'd much prefer to whitelist a hash of the current license rather than a whole repository. At least, if a repository starts using a copyleft license, we'll see it. |
Yup, but only for some of the licenses caught by the old behaviour. The issue I've found is the newer Licensee is catching issues with licenses that earlier versions didn't mainly because peeps have deviated from the standard license text. For example, the go-tmbundle grammar I've referenced in my comments has this teeny weeny modification at the end of the license which previously wasn't detected:
Others flagged by the test have similar deviations from the standard license text.
I don't think this aspect changes. AFAICS I'll need to add the hashes for the newly flagged licenses and we may need to do this more often in future if peeps insist on adding clauses or comments that steer them away from the standard license text. |
"966085b715baa0b0b67b40924123f92f90acd0ba", # sublime-shen | ||
"3df4ef028c6384b64bc59b8861d6c52093b2116d", # sublime-text-ox | ||
"fd47e09f1fbdb3c26e2960d0aa2b8535bbc31188", # sublimetext-cuda-cpp | ||
"93360925b1805be2b3f0a18e207649fcb524b991", # Std license in README.md of many TextMate grammars like abap.tmbundle |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All these changes are because Licensee now strips out "All rights reserved" text and markup before calculating the hash. This results in better license detection and also detection of identical licenses in the README of a lot of the TextMate grammars like that in abap.tmbundle. It also means that some of the old entries that remain now have a new hash.
@lildude any feedback/changes/things we can do within Licensee to make your life easier, please let me know. One thing to note, if you clone Licensee locally and
|
Thanks @benbalter. I'm still tinkering locally with Licensee to see what I can do with it. Thanks for these new deets... I hadn't found these yet. Your most recent update does bring up a question though: Why is "Closest licenses" returning a license with a much lower similarity than the license you've explicitly asked to check? I've not dug into how this is done in Licensee yet, but on the face of it, I'd have expected "BSD 3-Clause" to more likely have been identified as closest than "NCSA". |
We used to use the much more expensive levenshtein distance to match licenses. To speed things up, we compared license lengths and found the closest N by number of characters and only compared those licenses. We now are using Dice which is much faster, but we are still limiting potential matches due to limitations with how wordset matching may allow for false positives. That particular debug output should probably match all licenses and output the best match, not just those that are similar in length. |
default for licensed v1.0 changed from `vendor/licenses` to `.licenses`
default configuration file location changed from `vendor/licenses/config.yml` to `.licensed.yml`
@lildude I've updated the licensed dependency to 1.0.0 with minimum changes needed to keep things working v1 also brings some other changes that I haven't made here
Would you like me to push updates for these changes as well, or leave it as-is? |
🙇 Really appreciate it.
On the one hand I think leave as-is, but on the other hand, we probably should be one of the leaders in the usage of Licensed now it's public. |
I'm going to throw this out there and say, lets keep things as-is for the moment. We can address this in the future if we decide that's the route we want to take. For the moment, I'd like to get this merged so I can make a new release 🔜 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're lagging a low way behind on these two gems and regularly encounter problems with the incorrect license being detected on new grammars added.
This PR updates the Licensee and Licensed gems to newer versions and updates our usage.