-
Notifications
You must be signed in to change notification settings - Fork 21
docs: add fp-finder util sub command documentation #208
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
The `util` command includes sub-commands that are used from time to time and do not fit nicely into any of the other groups. Currently, the available sub-commands are: | ||
|
||
* `renumber-tests`: Used to simplify maintenance of the regression tests. Since every test has a consecutive number within its file, adding or removing tests can disrupt numbering. `renumber-tests` will renumber all tests within each test file consecutively. | ||
* `fp-finder`: Takes a file as input and outputs a filtered, alphabetically sorted list of unique words that are not present in the English dictionary. This can help in identifying potential false positives by focusing on unusual or unknown words. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* `fp-finder`: Takes a file as input and outputs a filtered, alphabetically sorted list of unique words that are not present in the English dictionary. This can help in identifying potential false positives by focusing on unusual or unknown words. | |
* `fp-finder`: Takes a file as input and produces a filtered, alphabetically sorted list of unique words that are not present in the English dictionary (WordNet). This can help in identifying potential false positives by focusing on unusual or unknown words. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are you sure about the Wordnet part ? The dictionary (https://github.com/dwyl/english-words) doesn't seem to mention anything related to it ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh... When we refactored the original util, we settled on using WordNet, which is a known and well documented source. I had simply assumed that the repository you were working with supplied the same word list. I quickly checked and there are a couple of Go packages that provide an interface to the WordNet database. I'd really like to continue using WordNet. One, because it's well known, and two, because I want to ensure consistency when running the tool. Could I trouble you to modify your implementation to use WordNet?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm... I read again coreruleset/crs-toolchain#181 and it seems like there was a bit of misunderstanding :)
Can you point me the library that you think will make sense to use ? I quickly took a look at https://github.com/fluhus/gostuff/blob/master/nlp/wordnet/parser.go and it seems not really equivalent to the wn command (that's why I proposed to just run the CLI command from crs-toolchain). Honestly, I was not really satisfied by anything I found. Feedback is welcome!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one doesn't look too bad: https://pkg.go.dev/github.com/lloyd/wnram.
Will require downloading the WN database, storing it in the cache dir (I imaging) and then parsing it from there.
Makes sense since the DB is ~80MB, wouldn't want that in the binary
DB available here: https://wordnet.princeton.edu/download
Needs proper attribution (like the original tool had).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Created coreruleset/crs-toolchain#229 for working on this.
Proposed changes
Add a quick mention in the doc about the new fp-finder subcommand for crs-toolchain