Skip to content

Enhanced Keywords #1382

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 65 commits into from
Oct 25, 2022
Merged

Enhanced Keywords #1382

merged 65 commits into from
Oct 25, 2022

Conversation

manticore-projects
Copy link
Contributor

@manticore-projects manticore-projects commented Oct 18, 2021

Rewrite the Reserved Keywords Logic:

  1. Automatically Derive all Keyword Tokens from the Parser.
  2. Explicitly add Reserved Keywords to the Grammar and classify, why/when it is reserved.
  3. Auto-generate the list of Allowed Keywords as Difference of 1) and 2) above
  4. Provide Gradle task to semi-auto generate RelObjectNameWithoutValue() based on 3)
  5. Parametrized Tests for Keywords, testing all keywords

Advantage:
a) we have now a clear documentation which Keywords are reserved for what reason, with proper tests
b) allowed keywords are generated (semi-)automatically, especially when more Tokens are added to the Grammar. New Tokens won't break SQL Statements, which have been parsed fine before.

To-Dos:
c) Composite Tokens do not work well and still need to be added manually to ALL_KEYWORDS (we would need to refactor the GRAMMAR in order to avoid such composite tokens)
d) @wumpz: Clarify the meaning/purpose of the various RelObjectNamexxxx() Productions, so we can auto generate them too.
e) Use the Gradle task to fully inject the RelObjectNamexxxx() Productions (instead of manually updating the code)

Resolves one more Special Oracle Test.
Fixes #1148
Fixes #1450
Fixes #1443
Fixes #1462
Fixes #1508
Fixes #1538
Fixes #1650

Add Keywords and document, which keywords are allowed for what purpose
Copy link
Member

@wumpz wumpz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, are all those keyword additions to relobjectnames needed? You talked about all those edge case improvements. Are most of those additions not exactly that?

@@ -1733,7 +2007,7 @@ Table TableWithAlias():
Alias alias = null;
}
{
table=Table() [alias=Alias() { table.setAlias(alias); }]
table=Table() [ LOOKAHEAD(2) alias=Alias() { table.setAlias(alias); }]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be the aim to do less not more LOOKAHEADs. All this for some keywords?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, all this for some keywords (although the Alias() Production is affected mostly)
But why exactly would you deny Tokens, which are not defined in the holy SQL:2016 standard as keywords when there is no compelling technical reason ?

Complaints about keywords are felt the second most concern of the users, right after quoting/escaping text and objects.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me it's not directly obvious why this LOOKAHEAD is needed. Is that explained somewhere?

Derive All Keywords from Grammar directly
Generate production for Object Names (semi-) automatically
Add parametrized Keyword Tests
@manticore-projects
Copy link
Contributor Author

manticore-projects commented Nov 1, 2021

@wumpz: Please check out the great work on PR #1254. It adds a new token QUICK and which of course should not become a Reserved Keyword.

Of course the author @OlivierCavadenti has no chance to know, that he would need to add QUICK to the productions RelObjectNamexxxx() in order to enable this token as Object Identifier. (And how should he, I figured that out only 6 month after my first contribution.)

This PR would solve this challenge reliably, preventing new tokens from breaking valid statements and ease the entry for new contributors.

@@ -227,6 +242,13 @@ task renderRR() {
}
}
}

task updateKeywords(type: JavaExec) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since I do not use gradle, what does this do?

Maven is the main build engine. You changed some gradle build options. Does the maven build still run?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It generates and prints the Source Code Text of the method RelObjectName() which you would manually insert/replace in the JSQL Grammar file.

This step is optional and does not affect the Maven build. It is executed manually and on demand only.

Although the long term goal was to have a mechanism inside the build tool (Gradle and/or Maven), which during the build:

  1. analyses the KeyWords automatically
  2. modifies the Grammar file (methods RelObjectName() and friends)
  3. builds the Parser with the modified Grammar
  4. runs the Keyword Tests

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this might also be implemented in a mojo in Maven, but that might be not worth the effort.

@wumpz
Copy link
Member

wumpz commented Nov 28, 2021

The naming of this RelObjectName ... methods is sure an issue that should be adressed in futher improvements. So sorry for that: its grown ... ;)

@wumpz
Copy link
Member

wumpz commented Nov 28, 2021

I am not a fan of injecting those keywords, since not every keyword could be introduced this way and needs other parts of the grammar to be modified. However, building this keyword testing using JUnit 5 is cool.

@manticore-projects
Copy link
Contributor Author

The naming of this RelObjectName ... methods is sure an issue that should be adressed in futher improvements. So sorry for that: its grown ... ;)

You would do me a great favor by documenting the purpose of those 5 methods verbosely. I tried my best to figure it out by trial'n error but I am still not exactly sure why we ended up with 5 methods. (And no worries about grown code.)

@manticore-projects
Copy link
Contributor Author

manticore-projects commented Nov 29, 2021

I am not a fan of injecting those keywords, since not every keyword could be introduced this way and needs other parts of the grammar to be modified. However, building this keyword testing using JUnit 5 is cool.

Honestly, I am not a big fan of that PR either -- but after spending many days on that matter trying various things it was the best approach I found and in my opinion it is still much better than the current state:

  1. at the moment, Keywords are just a mess and we get a lot of issues about keywords only.
    Are you able to document right now, what keywords are reserved and for what reason? I do not think so and the PR would solve this documentation problem.

  2. today, even when you have an opinion on keywords, we have no parametrized tests.
    You kindly admit, that these tests are kind of cool and I appreciate your feedback. Those parametrized tests however depend on a well defined list of keywords, which the PR provides. (Although we can argue of course, how/where exactly such a list should be defined.)

not every keyword could be introduced this way and needs other parts of the grammar to be modified

  1. While that is true for the PR, it is also true by now already. Also we could easily enforce a policy of "Keywords are defined by simple Tokens" (not allowing complex Tokens for Keywords).

The work flow is to: a) define the Tokens and Productions in the Grammar first and then b) document any new RESERVED Keywords only in the List (emphasis on RESERVED).

The PR has one big advantaged: By defining RESERVED keywords only, it will determine automatically any allowed Keywords and modify the Grammar semi-automatically.

@manticore-projects
Copy link
Contributor Author

Allow me to ask please:

  1. what was your preferred way to manage and document RESERVED KEYWORDs

  2. what was your preferred way to maintain any allowed Token/Keyword in RelObjectName()

  3. what was your preferred way to test all the possible Keywords (automatically)

And can we have a Skype chat about this :-)

@manticore-projects
Copy link
Contributor Author

Since you only have a list of all whitelisted keywords in ParserKeywordsUtil are not detected. So every addition of tokens needs to be recognized by a developer.

No, it is exactly the opposite: Restricted Keywords are rather static and would not be touched normally. Developer adds to All Keywords only by adding tokens/productions to the Grammar. Then the Gradle task builds Whitelist Keywords automatically and confirms them by brute force testing.

Only when those tests fail, we would have to amend Restricted Keywords. This could even be automated, it is simple Java Code after all (or could be a Text file or whatever.)

Where I still fail to understand you plan is this point:

  1. We agree to run the brute force tests in order to identify/confirm Restricted Keywords, do we?
  2. But those tests will work only AFTER updating the Whitelisted Keyword in RelObjectNameWithoutValue() -- because the tests rely on the final grammar.

So how would you run your Tests without whitelisting in the grammar first?

@wumpz
Copy link
Member

wumpz commented Sep 25, 2022

So how would you run your Tests without whitelisting in the grammar first?

Simply this test will fail until you whitelist or restrict the new tokens. That's what forces developers to do this decision (for us). You are right, that the restricted list should change rarely. The test should then give an advice, how and were this change should be done.
I am a fan of brute force testing changes a developer (can) make. In a sense this is the same like enforcing a specific code style.

No, it is exactly the opposite: Restricted Keywords are rather static and would not be touched normally. Developer adds to All Keywords only by adding tokens/productions to the Grammar. Then the Gradle task builds Whitelist Keywords automatically and confirms them by brute force testing.

I was not aware of this automatic whitelist keyword building. So this is +1.

@wumpz
Copy link
Member

wumpz commented Oct 16, 2022

So are you going to resolve the conflicts? This improvement of parsing could be done afterwards.

@vipcxj
Copy link

vipcxj commented Oct 24, 2022

@wumpz When could this pr be merged? The parse error is really annoying.

@manticore-projects
Copy link
Contributor Author

@wumpz When could this pr be merged? The parse error is really annoying.

I can supply you with a bleeding edge JAR file including those PRs, in case its urgent.

@wumpz wumpz merged commit 4863eb5 into JSQLParser:master Oct 25, 2022
@wumpz
Copy link
Member

wumpz commented Oct 25, 2022

So after merging, I am not able to build anymore since ConditionalKeywordsTest fails. I merged another PR.

So how should I proceed if I am not willing to use gradle? Isn't that the same problem you had with my proposal? Maven is the main build pipeline.

@manticore-projects
Copy link
Contributor Author

Good Morning. Thank you for finally accepting the PR.

What caused the problem:

  1. you have accepted another PR before (introducing the new Token LOCKED)
  2. you have not added LOCKED to the white- or black-list

Solution:
Amend the white- or black-list and then run the buildKeywords Task.

This should have been done by the author of the LOCKED PR from now on any author will be forced to do so.

It is exactly as you have requested for in your explanations.

Although I will run through this procedure today and send a very small PR.

@manticore-projects
Copy link
Contributor Author

So how should I proceed if I am not willing to use gradle?

If you insist in not using gradle updateKeywords then you still can amend RelObjectNameWithoutValue manually and add the whitelisted new token -- same as it has always been done before.

I would also provide an equivalent Maven task soon, since we are moving forward now.

Workflow should be like that:

1) Amend Tokens in the Grammar
2) When keyword-tests fail, run gradle updateKeywords
3) Then build via mvn package or gradle build (it does not matter)
4) Push/Deploy

Using gradle updateKeywords task is optional, you can also add the new Tokens manually has it has always been done before.

After accepting the Keywords PR, any further PR will automatically fail acceptance tests, unless keywords are in order.

@wumpz
Copy link
Member

wumpz commented Oct 28, 2022

No developer that is using maven knows what to do here? So this workflow has somehow to be included in a developer help. At the moment this is only your tool, since nobody knows and not everybody uses gradle.

@wumpz
Copy link
Member

wumpz commented Oct 28, 2022

Since we are not willing to run the gradle task and your tests use ParserKeywordsUtils.getReservedKeywords how should updating the grammar be enough?

@wumpz
Copy link
Member

wumpz commented Oct 28, 2022

Additional since some JavaCC methods are in place now there is a direct dependency for JavaCC.

@manticore-projects
Copy link
Contributor Author

Additional since some JavaCC methods are in place now there is a direct dependency for JavaCC.

Yes, because you insisted to parse the Grammer using JavaCC instead of using Regular Expressions. I suggest to use Regular Expressions instead for that reason.

Since we are not willing to run the gradle task and your tests use ParserKeywordsUtils.getReservedKeywords how should updating the grammar be enough?

Because this is exactly what you would have done before the semi-automation: edit the Production RelObjectNameWithoutValue manually. You still could very much do that and we should keep that option at all cost.

That said, I have added a Maven Task for running the same updateKeywords task.
You can run it via:

mvn compile exec:java

No developer that is using maven knows what to do here? So this workflow has somehow to be included in a developer help. At the moment this is only your tool, since nobody knows and not everybody uses Gradle.

Yes, of course. It has been verbosely documented within the new website, which I have committed 1 month ago or so: https://manticore-projects.com/JSQLParser/contribution.html#manage-reserved-keywords

I have updated the Keywords2 PR #1653 accordingly with your requested Maven task.

@d2a-raudenaerde
Copy link
Contributor

The Maven task could be implemented as a mojo (maven-plugin) I think, are you interested in that? Then you can run this plugin from the pom.xml during, for example, the generate-sources phase.

@d2a-raudenaerde
Copy link
Contributor

I have some experience writing these maven plugins, so if I infd some time this week, maybe I could draft up some integration so it 'just works' when you run for example mvn compile

@d2a-raudenaerde
Copy link
Contributor

d2a-raudenaerde commented Oct 31, 2022

But probably this will work as well:

<plugin>
        <groupId>org.codehaus.mojo</groupId>
        <artifactId>exec-maven-plugin</artifactId>
        <executions>
            <execution>
                <id>codegeneration</id>
                <phase>generate-resources</phase>
                <goals><goal>java</goal></goals>
                <configuration>
                   <mainClass>com.codegenerator.CodeGeneratorApplication</mainClass>
                </configuration>
            </execution>
        </executions>

</plugin>

@manticore-projects
Copy link
Contributor Author

Thanks! I have add submitted another PR like that on Saturday already.

@d2a-raudenaerde
Copy link
Contributor

Ah excellent!
I'll have to catch up I see :)

@d2a-raudenaerde
Copy link
Contributor

I can't find that PR? I think there might also be a chicken-and-egg problem, as the java code that does the work, also needs to be compiled first? (you can solve that using multi-module maven build, but that requires a bit more refactoring)

@manticore-projects
Copy link
Contributor Author

this one: #1653

Its pretty straight forward actually since we can execute Java Code directly without compiling the whole package. Also, this step is executed on demand: only when new tokens introduced and/or the Keyword Tests fail.

@wumpz
Copy link
Member

wumpz commented Nov 2, 2022

Yes, because you insisted to parse the Grammer using JavaCC instead of using Regular Expressions. I suggest to use Regular Expressions instead for that reason.

That's why e.g. my proposal is only running in scope test. Then this dependency is not needed. I was talking about the IMHO right way to parse the grammar and not the context in which this parsing should happen. I thought this should be obvious. For the same reason, the JSqlParser jar should not include this generation class. It is part of the build process (I know you do not accept that, but I think it's evident.) So a quick solution would be to put all of it in the test folder and make this JavaCC dependency of scope test.

However, to get a clean build again use for now Regular Expressions.

Saying that, since this is a build step or could be one, it should be separated from JSqlParser source code and put in some kind of build helper module and then could be definitely used if JSqlparser is built. Now If you clone this project you will fail, since the generator class is not yet built.

I have updated the Keywords2 PR #1653 accordingly with your requested Maven task.

I have reviewed it already and found in this issue the needed information on how to run it, so I will merge it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants