Skip to content

Improve CollectionRelationship uploads in the WorkBench #3240

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 38 commits into from
May 26, 2023

Conversation

melton-jason
Copy link
Contributor

@melton-jason melton-jason commented Mar 22, 2023

Fixes #3089


This pull request includes:

  • Autogenerated default forms for CollectionRelationship and CollectionRelType

    • I believe Specify 6 contains a view definition for CollectionRelType. Which of course does not work in Specify 7. So primarily ensure that the default form for CollectionRelationship is OK.
  • Apply the proper schema overrides for CollectionRelationship and CollectionRelType.

    • CollectionRelType was made requried for CollectionRelationship
    • name was made required for CollectionRelType
  • Makes CollectionRelType.name a frontend-only picklist with name as its field

    • CollectionRelType.name is how the collections are supposed to be determined during the upload
    • Because this is such an important field, this change will ensure that only existing names are used
  • Adds the ability to override which collection a column will upload into

    • To accomplish this, go into the uploadplan (A video is attached detailing this process)

    • Find the correct relationship/table/column you want to override

    • As a key/value pair for an uploadTable, add "overrideScope" : {"collection" : COLLECTION_ID}, where COLLECTION_ID is your desired collection to upload into

    • NOTE: This override should work for any table being uploaded through the WorkBench, not just CollectionRelationships


To View Upload Plan

wb_show_plan.mp4

* a video showing how to view the [DEV] Show Plan button


Collection Override Example

uploadPlan

* an example of a collection override being applied to the rightside of a CollectionRelationship, with the additions marked by an orange box

Because this table is required for Collection Relationship and it
resolves the left side and right side collections through the
RelType name.
This makes it so that only the available/valid names appear in the
Workbench when uploading Collection Relationships
As a property of any 'uploadTable' in the upload plan,
the following can be added to manually set the desired
collection to upload records into for that column

'overrideScope' : { 'collection' : <COLLECTION_ID>}

where <COLLECTION_ID> is the id of the collection.
@melton-jason
Copy link
Contributor Author

The only remaining thing to accomplish with this Pull Request is to have the backend automatically override the rightside collection when uploading a CollectionRelationship.

The intended behavior can also currently be accomplished by adding a manual collection override for the rightside field in the WorkBench upload plan
(see my steps outlined in the intial PR comment under the Adds the ability to override which collection a column will upload into section for instructions to test this).

Copy link
Member

@maxpatiiuk maxpatiiuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great!

So if I understood correctly, front-end doesn't need to fill out overrideScope, as back-end will do it automatically for CollectionRelatinships, but if it would be necessary, someone can manually edit the upload plan?

Also, if I enter overrideScope, then edit upload plan in the mapper and save, the overrideScope won't be preserved? Not an issue, but something to be aware of

@melton-jason
Copy link
Contributor Author

melton-jason commented Mar 22, 2023

So if I understood correctly, front-end doesn't need to fill out overrideScope, as back-end will do it automatically for CollectionRelatinships, but if it would be necessary, someone can manually edit the upload plan?

Yes, to stay as compatible as possible with past workbenches, I made overrideScope optional in the upload plan. We will need to add it automatically on the backend for things like CollectionRelationships.

And yes, if needed someone can manually edit the upload plan and tell the backend which collection to validate and upload into. With this change, some use cases might be uploading Collection Objects in multiple collections with one WorkBench dataset, or upload/validate PickLists for a separate collection, etc...
I tried to make it as easy to scale as possible, so in the future we can provide customizability for every scope.
If this is a popular idea, we could provide some UI in the WorkBench that would just add the overrideScope to the upload plan.

Also, if I enter overrideScope, then edit upload plan in the mapper and save, the overrideScope won't be preserved? Not an issue, but something to be aware of

The upload plan is stored in the database. Editing and then saving the upload plan saves it to the database, so overrrideScope will be preserved.

@maxpatiiuk
Copy link
Member

I like how you implemented it, but I don't think there would be that much need for this beyond CollectionRelationships - most of the time, if users want to upload into a different collection, they would just switch to that collection

@melton-jason melton-jason requested review from a team and maxpatiiuk March 22, 2023 22:04
Copy link
Member

@maxpatiiuk maxpatiiuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code for deferring scope looks tricky. Sorry can't comprehend it right now as I don't know well what's going on here.

If there is a way to simplify it then do it, otherwise document what the DeferedUploadPlan concept means and how it interacts with UploadPlan. Also, let's keep up the tradition of workbench have very good test coverage and add test cases for the newly added code - that will also help explain to others what's the expected behavior

@melton-jason
Copy link
Contributor Author

This is ready to be tested. I will fix any failing tests and add comments/documentation either tonight or tomorrow, but all of the logic is in place.

When testing, please make sure that right-side collection objects get uploaded/matched correctly in the rightSideCollection of the collection rel type. (Also, test to ensure that it validates and uploads with the correct collection's Catalog Number formatting).

Please also do brief testing of common workbench uploads/functions, as well as for recordsets.

@grantfitzsimmons
Copy link
Member

I am so impressed just reading your summary and code. Excited to try this and so happy to see this!!!

@melton-jason
Copy link
Contributor Author

melton-jason commented Mar 23, 2023

I was considering my changes and realized there is a bug right now with uploading CollectionRelationships with more than one CollectionRelTypes that have differing rightSideCollections.

Take this WorkBench dataset for example:
example_colrel_dataset

If the HostFish CollectionRelType has a separate rightSideCollection than TestRel, it will be ignored and upload that row's rightSide CollectionObject 111 into TestRel's rightSideCollection.

I am working on a fix, but expect more complicated WorkBench code!

You can continue to test this as normal, I am simply pointing out I am aware of this bug to prevent redundancy

@melton-jason
Copy link
Contributor Author

melton-jason commented Mar 24, 2023

I was considering my changes and realized there is a bug right now with uploading CollectionRelationships with more than one CollectionRelTypes that have differing rightSideCollections.

Take this WorkBench dataset for example:
example_colrel_dataset

If the HostFish CollectionRelType has a separate rightSideCollection than TestRel, it will be ignored and upload that row's rightSide CollectionObject 111 into TestRel's rightSideCollection.

I am working on a fix, but expect more complicated WorkBench code!

You can continue to test this as normal, I am simply pointing out I am aware of this bug to prevent redundancy

This should now be fixed.

Assuming no more bugs present themselves, all that is remaining is to add documentation in the code and update/write tests.

@realVinayak
Copy link
Contributor

I'm getting this error when I try to save this mapping.
image

@grantfitzsimmons
Copy link
Member

I'm getting this error when I try to save this mapping. image

Can you request changes on the PR?

Copy link
Contributor

@realVinayak realVinayak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Saving mapping throws an error -
image

@maxpatiiuk
Copy link
Member

NOTE: This override should work for any table being uploaded through the WorkBench, not just CollectionRelationships

Does it check for permissions when uploading into a different collection?

@melton-jason
Copy link
Contributor Author

I've (finally) resolved all of the failing tests.
Can this briefly be tested again? If everything is still good, then this can be merged! 🥳

@melton-jason melton-jason requested a review from a team April 9, 2023 18:57
@grantfitzsimmons
Copy link
Member

@chanulee1 Can you test this today?

@melton-jason
Copy link
Contributor Author

Can't believe I didn't catch this before. Can we have Collection Rel Type be required, or put a default value in? It throws this error otherwise - it should just highlight it red. image

Also have it highlight the empty catalog number (or any other required cells) after validation if they are empty. I was trying to validate it without putting in a catalog number. image

Collection Rel Type and the Left/Ride side Collection Objects are required, but only in the frontend (as schema overrides). Thus they are not actually required on a database level.

The cell highlighting is done by checking if the SpLocaleContainerItem associated with each column of the upload is required in the database.

required_by_schema = colopts.schemaitem and colopts.schemaitem.isrequired
result: Union[ParseResult, ParseFailure]
was_blank = value_in.strip() == ""
if was_blank:
if colopts.default is None:
missing_required = (
"field is required by upload plan mapping" if not colopts.nullAllowed else
"field is required by schema config" if required_by_schema else
None
)
result = ParseResult({fieldname: None}, {}, None, colopts.column, missing_required)

Preferably, we run a Django migration which properly marks the correct fields as required in the database. However, to be thorough, we would additionally need to modify the Specify6 datamodel xml (The django fields are built from the xml). We may even

Alternatively, we can make schema overrides for the backend. However, I do not like this solution because it fragments the schema even further. There would be front-end overrides, back-end overrides, and then what the value is in the database.

@melton-jason
Copy link
Contributor Author

I have talked with @grantfitzsimmons and we have reached a conclusion about how to approach this.
The solution will be painful regardless of what we do.

If we solve the root of the problem, alongside a django migration, we would need to modify Specify's 6 specify_datamodel.xml, as Specify 7 relies on it to build the datamodel. However, the specify_datamodel.xml is autogenerated by Specify 6 when it is compiled, so this would required a Specify 6 update.
(Preferably we move away from relying on the outdated xml. Having a Specify 7-only representation of the datamodel would also fix #2813)

Alternatively, adding schema overrides to the backend would needlessly make debugging in the future more difficult, as the schema would be split into 3 pieces (the frontend representation, backend, and database) and cost more development time in the future. The solution would be 'hacky', and not long term.

For now, unless any further ideas are brought forward, it may be wise to leave this as is for now and provide documentation explaining the behavior. A proper solution-regardless of the way we go-will warrant further discussion.

@grantfitzsimmons
Copy link
Member

@realVinayak and @chanulee1 – can you test this when you get a chance?

melton-jason added a commit that referenced this pull request Apr 28, 2023
This restriction needs to be put into place until a proper fix for
the Host Taxon relationship is implemented

The issue is that any Host Taxon uploaded through the WorkBench or
created using the '+' button on the QueryCombobox will always be
created in the current collection.

Related: #3240, #2675
Copy link
Contributor

@chanulee1 chanulee1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Right side collection object validated using right collection's catalog number format. Right side collection object is created when catalog number does not exist in right side collection. Collection Relationship uploads seem to be working well.

@realVinayak realVinayak self-requested a review May 23, 2023 04:38
@grantfitzsimmons grantfitzsimmons merged commit bb85a94 into production May 26, 2023
@grantfitzsimmons grantfitzsimmons deleted the issue-3089 branch May 26, 2023 20:19
@realVinayak
Copy link
Contributor

realVinayak commented May 18, 2024

See #4929 for an alternate implementation. The main difference is that I consider scoping completely different than binding. Instead of scoping upload plan "constantly" (which then required defer logic per-row), I scope it for every row (so defer isn't needed -- just scope). Simplifies things this way.

For performance, observe that upload plan only needs to be rescoped if there are collection relationships (or any other "defer" fields). So, it checks for them when doing scoping. If not found, it is safe to cache it. Hence, the little caching mechanism in that implementation. That is, the upload plan gets scoped just once (for the first row, the next rows reuse it as long as no defer fields are found), which is the common case (most wb uses don't involve collection relationships).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Uploading Collection Object Rels not working in the WorkBench
5 participants