Skip to content

Bug in beta tokenizer #178

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Qtax opened this issue Jan 8, 2024 · 8 comments
Closed

Bug in beta tokenizer #178

Qtax opened this issue Jan 8, 2024 · 8 comments
Labels
no-stale Prevent making as stale released

Comments

@Qtax
Copy link
Contributor

Qtax commented Jan 8, 2024

Just looking at the code I think there is a bug here:

https://github.com/scriptcoded/sql-highlight/blob/909c361424255f5af2a5e87b96dcd9e93e61123d/lib/index.js#L42C21-L42C21

This only matches litteral periods .. I assume that you meant to match any character /(?<unknown>.+?)/.

Without this change (now that it's using .matchAll()) anything in the input SQL string that does not match the regex will be removed from the results and not visible to the user. I assume that is undesired.

(Also of note that this approach will at most match a single character in every unknown group match.)

And a bit off topic, but I'm using this highlighter with a non-standard SQL flavor, which uses several additional keywords. These keywords will now be highlighted as identifiers. I hope the highlighting for those is not too disturbing. 😅

@scriptsbot
Copy link
Collaborator

This issue has been marked as stale because it has been open for 14 days with no activity. Remove the stale label or comment or this will be closed in 5 days.

@scriptsbot
Copy link
Collaborator

This issue was closed because it has been stalled for 5 days with no activity.

@scriptsbot scriptsbot closed this as not planned Won't fix, can't repro, duplicate, stale Jan 29, 2024
@scriptcoded scriptcoded added no-stale Prevent making as stale and removed Stale labels Feb 1, 2024
@scriptcoded scriptcoded reopened this Feb 1, 2024
@scriptcoded
Copy link
Owner

As good a time as any to give some form of response at least. Simply put I haven't had that much time the last few weeks due to various reasons. But next week is vacation, maybe I'll have some time then 😄 No promises though

@scriptcoded
Copy link
Owner

And a bit off topic, but I'm using this highlighter with a non-standard SQL flavor, which uses several additional keywords. These keywords will now be highlighted as identifiers. I hope the highlighting for those is not too disturbing. 😅

Covered by #189

@wkeese
Copy link
Contributor

wkeese commented Apr 6, 2024

Woops, that does seem like a mistake, not sure why I put the backslash there. Or for that matter, the + sign.

wkeese added a commit to wkeese/sql-highlight that referenced this issue Apr 10, 2024
Rearrange regexps so special category catches everything not caught by other categories.
Eliminates the "unknown" category I added in an earlier PR.

Fixes scriptcoded#178
@wkeese
Copy link
Contributor

wkeese commented Apr 13, 2024

I submitted #193, it simultaneously fixes this ticket (which is a regression from my #148) and fixes #150.

wkeese added a commit to wkeese/sql-highlight that referenced this issue Apr 14, 2024
Rearrange regexps so "special" regexp catches everything not caught by other regexps.
Eliminates the "unknown" segment I added in scriptcoded#148.

In practice, the only time we would hit the "unknown" segment is for a few weird
character that weren't already caught by the "special" segment, for example, the
? and unclosed ` in "a `?> b".  I figured that in those rare cases, we might as
well just call those characters "special".

Fixes scriptcoded#178, refs scriptcoded#148.
wkeese added a commit to wkeese/sql-highlight that referenced this issue Apr 14, 2024
Rearrange regexps so "special" regexp catches everything not caught by other regexps.
Eliminates the "unknown" segment I added in scriptcoded#148.

In practice, the only time we would hit the "unknown" segment is for a few weird
character that weren't already caught by the "special" segment, for example, the
? and unclosed ` in "a `?> b".  I figured that in those rare cases, we might as
well just call those characters "special".

Fixes scriptcoded#178, refs scriptcoded#148.
wkeese added a commit to wkeese/sql-highlight that referenced this issue Apr 14, 2024
Rearrange regexps so "special" regexp catches everything not caught by other regexps.
Eliminates the "unknown" segment I added in scriptcoded#148.

In practice, the only time we would hit the "unknown" segment is for a few weird
characters that weren't already caught by the "special" segment, for example, the
? and unclosed ` in "a `?> b".  I figured that in those rare cases, we might as
well just call those characters "special".

Fixes scriptcoded#178, refs scriptcoded#148.
scriptcoded pushed a commit that referenced this issue Jul 2, 2024
Rearrange regexps so "special" regexp catches everything not caught by other regexps.
Eliminates the "unknown" segment I added in #148.

In practice, the only time we would hit the "unknown" segment is for a few weird
characters that weren't already caught by the "special" segment, for example, the
? and unclosed ` in "a `?> b".  I figured that in those rare cases, we might as
well just call those characters "special".

Fixes #178, refs #148.
scriptcoded pushed a commit to wkeese/sql-highlight that referenced this issue Jul 2, 2024
Rearrange regexps so "special" regexp catches everything not caught by other regexps.
Eliminates the "unknown" segment I added in scriptcoded#148.

In practice, the only time we would hit the "unknown" segment is for a few weird
characters that weren't already caught by the "special" segment, for example, the
? and unclosed ` in "a `?> b".  I figured that in those rare cases, we might as
well just call those characters "special".

Fixes scriptcoded#178, refs scriptcoded#148.
scriptcoded pushed a commit that referenced this issue Jul 2, 2024
Rearrange regexps so "special" regexp catches everything not caught by other regexps.
Eliminates the "unknown" segment I added in #148.

In practice, the only time we would hit the "unknown" segment is for a few weird
characters that weren't already caught by the "special" segment, for example, the
? and unclosed ` in "a `?> b".  I figured that in those rare cases, we might as
well just call those characters "special".

Fixes #178, refs #148.
@scriptcoded
Copy link
Owner

#193 is merged to beta and will be released soon. Closing this a bit prematurely. Feel free to reopen if you don't think it's fixed.

scriptcoded pushed a commit that referenced this issue Jul 2, 2024
Rearrange regexps so "special" regexp catches everything not caught by other regexps.
Eliminates the "unknown" segment I added in #148.

In practice, the only time we would hit the "unknown" segment is for a few weird
characters that weren't already caught by the "special" segment, for example, the
? and unclosed ` in "a `?> b".  I figured that in those rare cases, we might as
well just call those characters "special".

Fixes #178, refs #148.
scriptsbot added a commit that referenced this issue Jul 2, 2024
# [5.0.0](v4.4.2...v5.0.0) (2024-07-02)

* chore!: add support for Node 22 ([9478bf1](9478bf1))

### Bug Fixes

* improve number detection ([02d459a](02d459a)), closes [#149](#149)
* improve operator detection ([183a4fb](183a4fb)), closes [#150](#150)
* typo in unknown segments ([70af287](70af287)), closes [#148](#148) [#178](#178) [#148](#148)

### Features

* add way to style identifiers ([25677d4](25677d4)), closes [#147](#147)
* release 5.1.0 ([cb0c0f1](cb0c0f1))

### BREAKING CHANGES

* The `default` segment has been split into `identifier` and `whitespace`
segments.  There's also a new `unknown` segment that will only show up for malformed
SQL such as an unclosed string.

However, the highlight() function works largely the same as before, both normal mode and HTML mode,
except for the bug fix to stop classifying identifiers as strings.  In other words, SQL like

select * from EMP where NAME="John Smith"

will get highlighted the same as before, i.e. no syntax highlighting for EMP or NAME.
* drop support for Node 14.
scriptsbot added a commit that referenced this issue Jul 2, 2024
# [5.0.0](v4.4.2...v5.0.0) (2024-07-02)

* chore!: add support for Node 22 ([9478bf1](9478bf1))

### Bug Fixes

* improve number detection ([02d459a](02d459a)), closes [#149](#149)
* improve operator detection ([183a4fb](183a4fb)), closes [#150](#150)
* typo in unknown segments ([70af287](70af287)), closes [#148](#148) [#178](#178) [#148](#148)

### Features

* add way to style identifiers ([25677d4](25677d4)), closes [#147](#147)

### BREAKING CHANGES

* The `default` segment has been split into `identifier` and `whitespace`
segments.  There's also a new `unknown` segment that will only show up for malformed
SQL such as an unclosed string.

However, the highlight() function works largely the same as before, both normal mode and HTML mode,
except for the bug fix to stop classifying identifiers as strings.  In other words, SQL like

select * from EMP where NAME="John Smith"

will get highlighted the same as before, i.e. no syntax highlighting for EMP or NAME.
* drop support for Node 14.
scriptsbot added a commit that referenced this issue Jul 2, 2024
# [6.0.0](v5.0.0...v6.0.0) (2024-07-02)

### Bug Fixes

* improve number detection ([02d459a](02d459a)), closes [#149](#149)
* improve operator detection ([183a4fb](183a4fb)), closes [#150](#150)
* typo in unknown segments ([70af287](70af287)), closes [#148](#148) [#178](#178) [#148](#148)

### Features

* add way to style identifiers ([25677d4](25677d4)), closes [#147](#147)
* release 5.1.0 ([3a58def](3a58def))

### BREAKING CHANGES

* The `default` segment has been split into `identifier` and `whitespace`
segments.  There's also a new `unknown` segment that will only show up for malformed
SQL such as an unclosed string.

However, the highlight() function works largely the same as before, both normal mode and HTML mode,
except for the bug fix to stop classifying identifiers as strings.  In other words, SQL like

select * from EMP where NAME="John Smith"

will get highlighted the same as before, i.e. no syntax highlighting for EMP or NAME.
@scriptsbot
Copy link
Collaborator

🎉 This issue has been resolved in version 6.0.0 🎉

The release is available on:

Your semantic-release bot 📦🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
no-stale Prevent making as stale released
Projects
None yet
Development

No branches or pull requests

4 participants