Skip to content

Add copy/pasting detection #3149

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 12 commits into
base: main
Choose a base branch
from
Open

Add copy/pasting detection #3149

wants to merge 12 commits into from

Conversation

CarolineDenis
Copy link
Contributor

@CarolineDenis CarolineDenis commented Mar 10, 2023

@CarolineDenis CarolineDenis requested review from maxpatiiuk and a team March 10, 2023 22:56
@chanulee1
Copy link
Contributor

How do I test this?

@maxpatiiuk
Copy link
Member

This is likely not ready for testing yet because the front-end tests are failing - but that's the intention of the change

The change will make front-end tests fail if code duplication is detected.

As soon as code duplication issues are fixed, the tests will be pass, and thus this will be ready for merging

@maxpatiiuk
Copy link
Member

I could see one issue with this though, not sure it's important enough to fix, but something to consider:

I made a change where back-end test are only run on back-end changes, and front-end tests are run on front-end changes.

However, the code duplication check is checking both front-end and back-end - yet is only run if the front-end changes.

The worst case is that a pull request that added code duplication on the back-end gets merged, and then front-end test will start failing for people.

The fix for this would be to modify test.yml to have a third category of tests - the tests that run always, regardless of what files where changed

@maxpatiiuk
Copy link
Member

a nice side-effect of adding this library:
it prints a nice summary of the size of the code base - https://github.com/specify/specify7/actions/runs/4407952016/jobs/7722317168

Screenshot 2023-03-13 at 12 57 07

(and it ignores the test files and third-party libraries, so the numbers are not inflated like in sp6)

Shows that front-end is 23946+ 29135+ 49773+ 483 = 103,337 lines of code
Back-end is about 18352+15+755 = 19,122

Copy link
Member

@maxpatiiuk maxpatiiuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All good (besides the failing tests)

@maxpatiiuk
Copy link
Member

@melton-jason can you please take a look into the following code duplication occurrences:

Clone found (python):
 - /home/runner/work/specify7/specify7/specifyweb/workbench/upload/tomany.py [59:20 - 70:3] (11 lines, 130 tokens)
   /home/runner/work/specify7/specify7/specifyweb/workbench/upload/upload_table.py [94:19 - 105:7]

Clone found (python):
 - /home/runner/work/specify7/specify7/specifyweb/workbench/upload/tomany.py [92:9 - 111:2] (19 lines, 249 tokens)
   /home/runner/work/specify7/specify7/specifyweb/workbench/upload/upload_table.py [182:9 - 201:2]

see how easy it is to fix them (if deduplicating them is adding a lot of complexity, then we can silence the duplication check for those two occurrences)

@maxpatiiuk
Copy link
Member

@realVinayak can you also look into resolving the following. should be relatively simple

Clone found (python):
 - /Users/maxpatiiuk/site/python/specify7/specifyweb/context/collection_resources.py [14:68 - 40:2] (26 lines, 194 tokens)
   /Users/maxpatiiuk/site/python/specify7/specifyweb/context/user_resources.py [31:89 - 57:17]

Clone found (python):
 - /Users/maxpatiiuk/site/python/specify7/specifyweb/context/collection_resources.py [55:9 - 79:13] (24 lines, 205 tokens)
   /Users/maxpatiiuk/site/python/specify7/specifyweb/context/user_resources.py [58:9 - 82:14]

Clone found (python):
 - /Users/maxpatiiuk/site/python/specify7/specifyweb/context/collection_resources.py [95:71 - 118:72] (23 lines, 194 tokens)
   /Users/maxpatiiuk/site/python/specify7/specifyweb/context/user_resources.py [97:92 - 120:101]

Clone found (python):
 - /Users/maxpatiiuk/site/python/specify7/specifyweb/context/collection_resources.py [118:72 - 136:6] (18 lines, 149 tokens)
   /Users/maxpatiiuk/site/python/specify7/specifyweb/context/collection_resources.py [39:79 - 57:6]

@melton-jason
Copy link
Contributor

melton-jason commented Apr 20, 2023

@melton-jason can you please take a look into the following code duplication occurrences:

Clone found (python):
 - /home/runner/work/specify7/specify7/specifyweb/workbench/upload/tomany.py [59:20 - 70:3] (11 lines, 130 tokens)
   /home/runner/work/specify7/specify7/specifyweb/workbench/upload/upload_table.py [94:19 - 105:7]

Clone found (python):
 - /home/runner/work/specify7/specify7/specifyweb/workbench/upload/tomany.py [92:9 - 111:2] (19 lines, 249 tokens)
   /home/runner/work/specify7/specify7/specifyweb/workbench/upload/upload_table.py [182:9 - 201:2]

see how easy it is to fix them (if deduplicating them is adding a lot of complexity, then we can silence the duplication check for those two occurrences)

These duplicates are related to the fact that subclasses of Named Tuples are not meant to be subclassed themselves.
More specifically, subclassing is allowed, but new attributes/data can not be added to the NamedTuple, only methods.
We have conversed about this issue before, in #3240 (comment)

In short, the backend is extensively using Named Tuples to store information about the status of an upload.
Named Tuples are convenient because it is very easy to convert between Named Tuples and JSON:

def parse_upload_table(collection, table: Table, to_parse: Dict) -> UploadTable:
def rel_table(key: str) -> Table:
return datamodel.get_table_strict(table.get_relationship(key).relatedModelName)
return UploadTable(
name=table.django_name,
wbcols={k: parse_column_options(v) for k,v in to_parse['wbcols'].items()},
static=to_parse['static'],
toOne={
key: parse_uploadable(collection, rel_table(key), to_one)
for key, to_one in to_parse['toOne'].items()
},
toMany={
key: [parse_to_many_record(collection, rel_table(key), record) for record in to_manys]
for key, to_manys in to_parse['toMany'].items()
}
)

and NamedTuples are immutable once created.

Through the process of uploading or validating a workbench upload plan, the backend 'converts' NamedTuple classes into ones that have more information or are more 'specialized', up until the actual upload/validation. (A typical example of the conversion process is UploadTable -> ScopedUploadTable -> BoundUploadTable)

Because we can not subclass a generic UploadTable in ScopedUploadTable, BoundUploadTable, etc., we have to reuse a lot of the 'generic' functionality common to all of them.

@maxpatiiuk
Copy link
Member

@melton-jason not a problem we can leave it as is given that it is presenting issues. We have the capability to silence those errors by adding comments like this - 9c52034#diff-c8cfcd0e13a828573c0d195ee26ce1cbb513e3554e37dcb0860e9053ffa16568R73-R121

Though, the issues in #3149 (comment) would be easy to fix so should be fixed

@carlosmbe carlosmbe removed the request for review from a team June 13, 2023 17:35
@CarolineDenis
Copy link
Contributor Author

@melton-jason do we want to merge this at some point?

@melton-jason
Copy link
Contributor

Yes, we can look into merging this.
We should probably silence the copy/paste detection as a result of #3149 (comment) and look into the errors from
#3149 (comment)

@carlosmbe
Copy link
Contributor

Is this a feature that requires any UX Testing? @CarolineDenis

@CarolineDenis
Copy link
Contributor Author

@carlosmbe , no it does not

@CarolineDenis
Copy link
Contributor Author

@specify/dev-testing
What do we want to do with this PR?

@maxpatiiuk
Copy link
Member

Would love to see this merged.
If you need help, I can work on this this Saturday

@realVinayak
Copy link
Contributor

realVinayak commented Jun 19, 2024

regarding backend workbench, is there a way to silence the workbench ones for now? They should be fixed in this PR - I reworked the matching, and deleted everything from the tomany.py file 😄 (made the logic completely generic, no need for that file anymore, see diff).

Comment on lines +10 to +17
# OS
.DS_Store

# Editors
/.vscode/
/.idea/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used to be against OS or editor specific gitignore patterns, and was recommending using global gitignore for that but I changed my mind.

Reasons:

  • Setting up global gitignore is added friction, especially for new team members or external contributors.
  • Automated tools (like jscpd, eslint, prettier) and some IDEs do not look into global gitignore file

@maxpatiiuk
Copy link
Member

maxpatiiuk commented Jun 22, 2024

I just realized that jscpd will also help us detect incorrect merge resolutions (i.e cases when a deleted file got incorrectly "resurected" during merging). Pretty nice

@maxpatiiuk
Copy link
Member

Testing instructions:

  • Ensure the Form Editor/App Resource buttons at the top of the app resource still work and look good
  • Ensure the Table Format/Table Aggregation buttons at the top of a "Record Formaters" visual editor still look good
  • Open User Tools -> Operations API and make sure the documentation for the /context/user_resource/* and /context/collection_resource/* endpoints looks similar or better to prodcution

SCHEMA['toManyIndependent'] &
SCHEMA['toOneDependent'] &
SCHEMA['toOneIndependent'])[FIELD_NAME]
FIELD_NAME extends SchemaFields<SCHEMA>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-run npx tsc --extendedDiagnostics several times with and without this change
seems like this change makes typescript type-checking of entire project faster between 0s and 1s (possibly because TypeScript can cache the resolved type?)

@@ -37,6 +37,7 @@ def get_cols(self) -> Set[str]:
| set(col for u in self.toOne.values() for col in u.get_cols()) \
| set(col for rs in self.toMany.values() for r in rs for col in r.get_cols())

# jscpd:ignore-start
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

regarding backend workbench, is there a way to silence the workbench ones for now? They should be fixed in this #4929 - I reworked the matching, and deleted everything from the tomany.py file 😄 (made the logic completely generic, no need for that file anymore, see diff).

Sounds good. Leaving these to you

@maxpatiiuk maxpatiiuk requested review from acwhite211, realVinayak and Areyes42 and removed request for realVinayak and melton-jason June 22, 2024 23:26
@CarolineDenis CarolineDenis added this to the 7.9.9 milestone Jul 30, 2024
@CarolineDenis
Copy link
Contributor Author

@maxpatiiuk,
Do you know if there is a possibility to have this ran when the dev runs a command locally instead of having it as failing test / passing test in our actions?

@maxpatiiuk
Copy link
Member

You can run the check locally at any point by calling this command from the js_src folder:

npx tsx lib/tests/jscpd.ts

We could have it run automatically in a pre-commit hook, however the command takes 11s to check entire specify7 repository, so this would slow down commits considerably.

I know that IntelliJ IDEs had this built-in live in the editor. Maybe there is a VS Code plugin for that too - but I imagine it would slow down the IDE a bit.

@maxpatiiuk maxpatiiuk requested review from sharadsw and removed request for realVinayak December 10, 2024 03:07
@pashiav pashiav requested a review from a team January 21, 2025 15:07
Copy link
Contributor

@pashiav pashiav left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Verify that the Tabs buttons in app resources still look good after the change
  • Verify that the "Table Format"/"Table Aggregation" tabs in the
  • Go to User Tools -> Operations API -> Verify that the documentation for the /context/user_resource/* and /context/collection_resource/* endpoints looks good (or better) than production

Looks good so far! For the "Table Format"/"Table Aggregation" tabs, what exactly should we be focusing on testing here?

@maxpatiiuk
Copy link
Member

maxpatiiuk commented Jan 23, 2025

ups! looks like I lost my train of thought when I was typing that 😆
I meant to say that they are supposed to look identical on this branch to how they do on production branch

Triggered by 61bc8d7 on branch refs/heads/issue-2996
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: 📋Back Log
Development

Successfully merging this pull request may close these issues.

Add a code duplication scanner
8 participants