-
-
Notifications
You must be signed in to change notification settings - Fork 564
Executor performance optimization #314
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Hey, it is very interesting! Curious to see your final benchmarks. But keep in mind that GraphQL execution is nuanced and has edge cases. So you can't be sure that you've got significant performance benefits until full test suite passes. Also, one thing you should be aware of is that this project is a direct port of the graphql-js. So the executor mirrors their executor for about 95%. This was a pragmatic choice for us to make maintenance of this library realistic. And we try to keep as close as possible to the reference implementation exactly due to the cost of maintenance. So for this PR to be merged into the lib, we must be sure that you will be ready to maintain it for some foreseeable future (meaning bug-fixes and keeping it up-to-date with the future versions of the spec). Because currently it's the matter of porting changes from the reference implementation, but if we switch to the new executor the cost of changes will be much higher. But anyway, if you succeed with major performance improvements we can actually back-port them to the graphql-js and suggest for merging (since they have a similar issue - graphql/graphql-js#723). So I am really curious to see your final results! Now back to your questions:
This is probably a bug.
Yeah, we just started integrating code style checks into the code base (#284). At the moment only
We'd like to avoid severe breaking changes. I guess it shouldn't be that hard to pass an array by reference instead of just stdClass or am I missing something?
Yes there is. Basically the next major version will require at least PHP7.1 |
f108893
to
0845fb0
Compare
I get now what you meant. I had to give up the idea of separate compilation phase, because query Q($condition: Boolean!) {
foo @include(if: $condition)
foo: bar
} This query might call resolver for either
However, I kept working with the simplified execution flow. And it got me to an implementation that is, for our "big query", at least twice as fast. New executor uses coroutines implemented using generators instead of promises and callbacks. For every object field resolution the Promises are handled by Call graph is much simpler than before: A lot of performance improvements were gained by memoization, see Another 30 % improvement could be gained when schema sanity checks are removed, see jakubkulhan/graphql-php@compiler...faster. However, it would mean some breaking changes (11 tests failing).
Ok, changes formatted by phpcbf.
Because of coroutines scheduling I don't know the point in which the resulting array should be checked whether it must be converted to
As we use this library in production, I intend to do this :) |
Sounds promising! Will look into it closer when I have a chance. One major issue for me though is that you got rid of PromiseAdapter. There are other projects already which depend on this feature, like dataloder-php + I know people who do use it with ReactPromiseAdapter in production. Also, I am curious what would happen if we apply memoization for collectSubFields for list items. This optimization is actually already merged into graphql-js (here) but we didn't port it yet. Would be great to compare. By the way, do you have any benchmarking project? I'd like to play with it too, just need some common ground to check against. |
Oh, ignore about promise adapter part, I see it is still there. Just misinterpreted your notes %) |
I did a quick benchmark and looks like the improvement suggested in the reference implementation makes it very close to your implementation (with slightly better memory footprint).
I just pushed this memoized version to master. Can you try it against your complex query? |
I created a benchmark for our complex query (it does 1499 resolve calls, of which 449 are for Results are similar to yours:
Reason, that old executor with memoization is worse than in your benchmark, is probably by new executor's short-circuit execution for We run queries with
Also if all calls to
|
Apart from performance, also keep an eye on memory footprint. Looks like the new executor is much heavier in this regard. |
Fixed, reference cycles needed to be broken.
|
Numbers are great! Still hesitate though if 10ms difference for your 150ms query worth the complete rewrite (and increased maintenance costs in future). But if we continue with the new executor we will still give users an option to switch between the old and the new one (at least for 0.13.x). So it makes sense to put it alongside with the old one (i.e. in the ExperimentalExecutor namespace) and add a static method to switch implementations on demand. The reason for this is that while a test suite should cover most of the edge cases, there are still unknown unknowns. Pretty sure we'll get unexpected issues and it would be nice if users could report them and switch to the old implementation without major efforts until we fix everything. After the adoption period, we'll remove the switch (probably for 0.14.x). Please let me know if it works for you. |
@vladar, when do you plan to release version 0.13.x? Our team also use webonyx in preproduction and we interested in optimisation stage of executing. |
@vladar Yes, that sounds reasonable. |
@Smolevich likely somewhere in September. But I plan to release the memoization improvement of Executor in 0.12.x branch in next couple days (since it's a cheap and fully backward compatible change) |
Rebased onto master, new executor implementation moved to namespace |
@jakubkulhan, can you show code of benchmarks upper? |
@Smolevich Sorry, can't do, it uses our schema code. However, the query looks like this https://gist.github.com/jakubkulhan/e938b66d7e498d549d1d8727cddd3659 |
FYI, I will get back to this PR after porting all of the changes for 0.13, so that we only need to resolve conflicts once before merge. Will post another comment here when ready for merge. |
@vladar Ok, thank you. Let me know and I'll rebase the branch onto master. |
After some thinking, I decided to preserve reference executor as a default one for the upcoming version. But we will encourage users to try the new Executor in UPGRADE docs. Given their feedback and how things go, we may change the default implementation in the next version for all users. The reason is that it is hard for me to read and understand the new code, so maintenance of the executor will be mostly on your shoulders. We must make sure this cooperation works smoothly before making it a default. So if it works for you, can you rebase it onto master? I am ready to release 0.13.x |
@vladar I'll rebase new executor code in the next few days. Is there something I can improve, so the code could be easier to understand? |
17aa1ee
to
4bc956c
Compare
OK, rebase is complete. |
Merged, thanks! I did some further research and the interesting thing about new executor is that it traverses fields in a different order. ReferenceExecutor is a depth-first while new Executor does a breadth-first traversal (scoped to a parent). Imagine the following query: {
a1 {
b1 {
c1
}
d1
}
a2 {
b2 {
c2
}
d2
}
} The old executor would traverse it as I am curious if actual performance boost occurs because of this. Will do some research in my spare time. Anyways, this should be totally OK in theory, but we need to collect some real-world feedback on the implications of this. Do you use it in production already? |
@jakubkulhan @simPod Code coverage dropped to 84% because it won't run tests against the new Executor. Any ideas if can we make code coverage run tests twice (with different environment variables)? |
@vladar Thank you for merging! About order of traversal, I've encountered this in some test cases (e.g. Also the new executor may call resolvers for fields that won't be included in the result due to errors on sibling fields or sibling's sub-graphs. For example with schema: type RootQuery {
foo: Foo!
}
type Foo {
id: ID!
foo1: Foo!
foo2: Foo!
foo3: Foo!
} For query: query Q {
foo {
id
foo1 { id }
foo2 { id }
foo3 { id }
}
} AFAIK the old executor won't call resolver for For coverage, it should be possible to run PHPUnit with env variable We've been using without any issue the new executor in production at Scuk.cz since the end of July when I committed the implementation that passed all tests. Although I've made one addition to the code we run in production - result caching. It can bypass calling resolvers for whole sub-graph and read result from the cache instead. Now that this PR is merged, if you'd be interested, I might send the PR with this change. However, it would be only for the new executor, I don't see how it would be possible to port this to the old executor. |
It will however ~double the CI test runtime, not a big deal I guess. But merging clovers cannot be avoided. We can eliminate the double runtime by specifying eg. |
Anyone willing to send a PR for coverage fix in Travis config? @simPod I think I can live with doubling of the tests runtime (sorry Travis). |
Very interesting. How do you invalidate it (for the whole subgraph)?
I think we can add it in the next version. For now, we should just collect some feedback on the new executor. But it sounds pretty exciting. One thing I am interested at the moment is benchmarks for old/new executor (I am still curious to test my hypothesis about effects of traversal path on performance). If you did something already - would be very interested to see under |
Wow! Very interesting!!
Would you give me more details for how you optimized the execution performance ? |
Sounds interesting. Do you maybe have an example how you did that? |
@adri |
@jakubkulhan thanks! Do you store the DocumentNode serialized? If yes, did you have issues with the library updating and changing properties that make unserialize fail? |
@adri Yes, it's stored serialized by PHP's native |
@jakubkulhan u should use ig_binary for serilazation it has performance increase over the default : Native PHP : Igbinary : |
Have you send a PR about that or can you tell me how you implemented it? It would be very nice to have the ability of sub-graph result caching in the lib |
Hi. First, thank you for the great library! We use it to power our API at Scuk.cz and it's been a smooth ride.
Recently, however, we've discovered poor performance for some of the queries the frontend service sends to the GraphQL endpoint. Over time our schema got pretty big (currently it consists of ~300 types /half of which are object types/ and ~1000 resolvable fields on object types). So did the queries (one of the pages in our app queries ~200 fields).
We successfully refactored GraphQL schema so that types & fields are created lazily. Also we started caching parsed AST. This got us nice speed improvements. After these I've started to look at performance of the
Executor
. It costs us more than 150ms per request for some queries. This is the call graph of GraphQL endpoint for such query (library calls are highlighted):I've started digging in the code and found some easy to improve functions (e.g.
Executor::shouldIncludeNode()
). But I estimate these would shave off only couple of milliseconds. So I've started to work on a new executor:This is still a work in progress and will need some time before the new executor passes the test suite. Yet, what do you think about this? :)
Questions & notes:
ValueNode
interface doc comment says thatVariableNode
is aValueNode
. ButVariableNode
does not implement it. Is it intentional, or a bug?composer run lint
, however, PHP-CS reported issues even with current files.)stdClass
). I started new executor to work withstdClass
es instead, because I need pass by reference semantics, also it fixes JSON-serialization-related issue from the start. But this would be big breaking change. Should I continue withstdClass
es, or use arrays?ResolveInfo
is tightly coupled to AST traversal execution model. After the compilation, AST is no longer relevant for new executor. It could include needed AST nodes in instructions, however, it defies the point of the compilation step.