Skip to content

Refactor to improve performance w/ weak map, hoisted regex #13

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Apr 3, 2024
Merged

Refactor to improve performance w/ weak map, hoisted regex #13

merged 3 commits into from
Apr 3, 2024

Conversation

bluwy
Copy link
Contributor

@bluwy bluwy commented Apr 3, 2024

  • This PR hoists regexes as top-level variables to prevent constantly creating them.
  • It also implements caching for charactersToExpression to prevent repeatedly created regexes dynamically.

Benchmark

During my benchmarking of a project using toHtml heavily from hast-util-to-html, toHtml took about 6.06s in execution time. This PR drops it to 2.38s. toHtml uses this package which is hot code during its serializeAttribute call. Examples: https://github.com/search?q=repo%3Asyntax-tree%2Fhast-util-to-html%20subset%3A&type=code

Notes on caching

The WeakMap caching for charactersToExpression means that if the array is mutated, the regex won't be re-built but I think this is a fine caveat that shouldn't be relied upon. If I remove the WeakMap caching, the benchmark only drops to 5.52s.

In hast-util-to-html, there are two places (lib/handle/text.js and lib/handle/comment.js) that can yet to take advantage of the caching. If this PR is accepted, I can update hast-util-to-html to do so, and I expect another 1s savings, bringing down to around 1.4s for toHtml.

Copy link
Owner

@wooorm wooorm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Heya! Nice!

Only thing I’d want is to move the caching part to a charactersToExpressionCached wrapper (which you can define right above charactersToExpression).

Otherwise, maybe a comment explaining why the weak map is so important (as you do here in this PR message).

Also, maybe good for reference, as this is 2 things in 1: do you have the separate numbers for a) weak map, b) hoisted regexes?

@wooorm
Copy link
Owner

wooorm commented Apr 3, 2024

array is mutated

I’m 👍 to using ReadonlyArray in the types. I don’t consider it a breaking change either.

@wooorm
Copy link
Owner

wooorm commented Apr 3, 2024

I’ll work on the xo issue in this repo (edit: done on main!)

@bluwy
Copy link
Contributor Author

bluwy commented Apr 3, 2024

Updated!

Also, maybe good for reference, as this is 2 things in 1: do you have the separate numbers for a) weak map, b) hoisted regexes?

For weak map only, I measured toHtml and it dropped to 2.69s. For hoisted regexes only, it dropped to 5.52s. If both are applied, it dropped to 2.38s. (NOTE: Initial timing is 6.06s)

So it seems like hoisting the regex has only a small gain, but I think it's because toHtml always passes a subset, so it hits this part only:

value = value.replace(
options.subset ? charactersToExpression(options.subset) : /["&'<>`]/g,
basic
)
if (options.subset || options.escapeOnly) {
return value
}

Skipping most of the regex work after it. Other usages for stringify-entities should have a bigger impact.

Copy link
Owner

@wooorm wooorm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One last Q!

@wooorm wooorm changed the title Hoist regex as variable Refactor to improve performance w/ weak map, hoisted regex Apr 3, 2024
@wooorm wooorm merged commit 2649911 into wooorm:main Apr 3, 2024
@wooorm
Copy link
Owner

wooorm commented Apr 3, 2024

Thanks! Released in 4.0.4!

@bluwy bluwy deleted the regex-variable branch April 3, 2024 13:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants