Skip to content

Perf: Improve performance of multithreaded --check and undefined-field diagnostic #2738

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Jul 1, 2024

Conversation

tomlau10
Copy link
Contributor

@tomlau10 tomlau10 commented Jul 1, 2024

Recently I am integrating LuaLs into our project's CI and I am looking for ways to speed up the process. The following are 2 simple optimizations that I have found:

Work distribution in multi-threaded --check

I found this recently added multi-threaded --check in #2638. Upon some code reviewing I suspect that the hash function used to distribute work across multi threads is not that perfect, which limits its full potential. I simply changed it to Round Robin (after sorting the file list of course), and this yields another 10% performance gain in my project workspace (500+ lua files). Benchmarks as follow:

  • CPU: Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz
threads ori-1 ori-2 ori-3 ori-avg new-1 new-2 new-3 new-avg diff
1 38.67 38.88 38.76 38.77 38.77 39.35 38.75 38.95667 0%
2 25.57 25.64 25.79 25.66667 22.93 22.89 22.86 22.89333 -11%
4 17.55 17.93 17.41 17.63 15.81 15.22 15.33 15.45333 -12%

I would like to mention @emmericp here as this function is originally implemented by him, to see if this small optimization can help shaving off a few more seconds in his project 😄 (great work btw 👍 )

Early break in undefined-field diagnostic

Upon some profiling, the undefined-field check is quite time consuming. By looking into the code I found the early break logic of it can be optimized. The #vm.getDefs(src) > 0 is used to check if there is any definition for a given src object, and there is no need to fetch the full definitions list. We can actually return as soon as the 1st definition is found. This yields a 25% performance gain in this single check within my workspace 🚀

  • setting "diagnostics.severity": { "undefined-field": "Error!" } in .luarc.json then run with --checklevel=Error
    NOTE: The following is run in single thread to avoid confusion with the above threaded optimization.
1 2 3 avg
ori 15.4 15.39 15.44 15.41
new 11.51 11.26 11.26 11.34333
diff -26%

tomlau10 added 3 commits July 1, 2024 09:35
The current hash function used to distribute work seems not perfect.
We can actually use round robin to distribute after sorting file list.
The current early break only wants to check if there are any definition.
There is no need to fetch the full definitions list.
We can early break as soon as we found the 1st one.
@C3pa
Copy link
Contributor

C3pa commented Jul 1, 2024

Very nice tomlau10! Can we get some insight into how you profiled the LuaLS? I am primarily interested in how you found which functions to optimize.

@sumneko
Copy link
Collaborator

sumneko commented Jul 1, 2024

Great work!

@sumneko sumneko merged commit ebbc372 into LuaLS:master Jul 1, 2024
9 of 11 checks passed
@tomlau10
Copy link
Contributor Author

tomlau10 commented Jul 1, 2024

Can we get some insight into how you profiled the LuaLS?

@C3pa Nothing advanced, I just profile each diagnostics one by one 😂 by setting up the .luarc.json in my workspace like this:

"diagnostics.severity": {
    "undefined-field": "Error!"
}

And then I run LuaLS with --check --checklevel=Error, which will only check against those with severity Error or above.
(In this example setup is the undefined-field)

During my profiling, I also found that the invisible diagnostic is quite time consuming as well (on par with undefined-field check). But currently I have no idea what can be optimized. 😕


edit:

I am primarily interested in how you found which functions to optimize.

Oh~ I guess you are asking how I found that #vm.getDefs(src) > 0 can be optimized? Nothing special, just by some programmer instincts 😄

@C3pa
Copy link
Contributor

C3pa commented Jul 1, 2024

Interesting. Thanks for your insight! I thought you might have used some kind of profiler setup.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants