Skip to content

docs: add fp-finder util sub command documentation #208

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from
Draft
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 5 additions & 1 deletion content/6-development/6-2-crs-toolchain.md
Original file line number Diff line number Diff line change
Expand Up @@ -60,6 +60,7 @@ The output should be:
The level of logging can be adjusted with the `--log-level` option. Accepted values are `trace`, `debug`, `info`, `warn`, `error`, `fatal`, `panic`, and `disabled`. The default level is `info`.

## Full Documentation

Read the built-in help text for the full documentation:

```bash
Expand Down Expand Up @@ -102,7 +103,10 @@ The `format` sub-command reports formatting violations and actively formats asse

## The `util` Command

The `util` command includes sub-commands that are used from time to time and do not fit nicely into any of the other groups. Currently, the only sub-command is `renumber-tests`. `renumber-tests` is used to simplify maintenance of the regression tests. Since every test has a consecutive number within its file, adding or removing tests can disrupt numbering. `renumber-tests` will renumber all tests within each test file consecutively.
The `util` command includes sub-commands that are used from time to time and do not fit nicely into any of the other groups. Currently, the available sub-commands are:

* `renumber-tests`: Used to simplify maintenance of the regression tests. Since every test has a consecutive number within its file, adding or removing tests can disrupt numbering. `renumber-tests` will renumber all tests within each test file consecutively.
* `fp-finder`: Takes a file as input and outputs a filtered, alphabetically sorted list of unique words that are not present in the English dictionary. This can help in identifying potential false positives by focusing on unusual or unknown words.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* `fp-finder`: Takes a file as input and outputs a filtered, alphabetically sorted list of unique words that are not present in the English dictionary. This can help in identifying potential false positives by focusing on unusual or unknown words.
* `fp-finder`: Takes a file as input and produces a filtered, alphabetically sorted list of unique words that are not present in the English dictionary (WordNet). This can help in identifying potential false positives by focusing on unusual or unknown words.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are you sure about the Wordnet part ? The dictionary (https://github.com/dwyl/english-words) doesn't seem to mention anything related to it ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh... When we refactored the original util, we settled on using WordNet, which is a known and well documented source. I had simply assumed that the repository you were working with supplied the same word list. I quickly checked and there are a couple of Go packages that provide an interface to the WordNet database. I'd really like to continue using WordNet. One, because it's well known, and two, because I want to ensure consistency when running the tool. Could I trouble you to modify your implementation to use WordNet?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm... I read again coreruleset/crs-toolchain#181 and it seems like there was a bit of misunderstanding :)

Can you point me the library that you think will make sense to use ? I quickly took a look at https://github.com/fluhus/gostuff/blob/master/nlp/wordnet/parser.go and it seems not really equivalent to the wn command (that's why I proposed to just run the CLI command from crs-toolchain). Honestly, I was not really satisfied by anything I found. Feedback is welcome!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one doesn't look too bad: https://pkg.go.dev/github.com/lloyd/wnram.
Will require downloading the WN database, storing it in the cache dir (I imaging) and then parsing it from there.

Makes sense since the DB is ~80MB, wouldn't want that in the binary

DB available here: https://wordnet.princeton.edu/download
Needs proper attribution (like the original tool had).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created coreruleset/crs-toolchain#229 for working on this.


## The `completion` command

Expand Down