-
Notifications
You must be signed in to change notification settings - Fork 218
WIP: feat: depends on wait condition design #994
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
805134d
to
a57bca1
Compare
# Conflicts: # operator-framework/src/test/java/io/javaoperatorsdk/operator/sample/standalonedependent/StandaloneDependentTestReconciler.java
Will add more tests, but this is the basic idea. (Also there are flaky tests, probably not coming from this PR, will make a separate one, especially for StandloneDependentResoruce) |
Kudos, SonarCloud Quality Gate passed! |
while (Instant.now().isBefore(deadline)) { | ||
resource = supplier.get(); | ||
if (resource.isPresent() && condition.isFulfilled(resource.get())) { | ||
return; | ||
} else { | ||
var timeLeft = Duration.between(Instant.now(), deadline); | ||
if (timeLeft.isZero() || timeLeft.isNegative()) { | ||
handleConditionNotMet(); | ||
} else { | ||
sleepUntilNextPoll(timeLeft); | ||
} | ||
} | ||
} | ||
handleConditionNotMet(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is supposed to be called in a reconcile loop, then it is a synchronuous wait. IMHO, while-sleep-until timeout
is not a good pattern especially because timeout can be very long and it will lock the loop. It can be start in a parrallel thread but then, I think the same waiting loop can be achieve with event source. No need to poll, the dependent should trigger an event when it change and the condition will eventually be met in a latter reconcile loop. We only miss to setup to trigger an event at most when deadline occurs.
Also, the sleep loop is not resilient to an operator restart. I expect the timeout to last from the creation. If the operator stop just after the creation and restart after the timeout, then it should reconcile and immediatly report a timeout and not wait for the timeout.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @scrocquesel , pls take a look here, that explains the use case for this:
#995 (comment)
TBH, I'm also kinda thinking also to put it there or not (I mean the sync wating), to explicitly support that use cases mentioned there. Since it's just for some small subset of operators. But as described, and alos shown in the integration tests, most condition check is async or that is perferred.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, the sleep loop is not resilient to an operator restart. I expect the timeout to last from the creation. If the operator stop just after the creation and restart after the timeout, then it should reconcile and immediately report a timeout and not wait for the timeout.
That would require a state management, what we deliberatily want to avoid if possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that it's still a goal that on a reconciliation we reconcile every resource we manage. So if we want to have an analogy here the goal is not to have a state machine rather a workflow engine that is executed then on every reconciliation from beginning to end.
Well at least that is the idea :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @scrocquesel , pls take a look here, that explains the use case for this: #995 (comment)
TBH, I'm also kinda thinking also to put it there or not (I mean the sync wating), to explicitly support that use cases mentioned there. Since it's just for some small subset of operators. But as described, and alos shown in the integration tests, most condition check is async or that is perferred.
@csviri In the test below, what would happen if nginxNginxDeploymentDependentResource trigger an event before the checker has poll for the condition to be met ? Does a new reconcile is started on another thread or is it discarded because there is already a reconcile ongoing ? A deployment can take quite some time and the more the reconcile loop is running, the more the update is exposed to concurrency edition from the operator itself or from outside. What if an error occurs with the deployment and I want to delete the primary ? Does the check with the default infinite will be interrupted ?
Also, this is may works well for creation, but what should happen if the condition is unmet latter on. The reconcile will again wait in a loop and the status of the primary status will not reflect the actual state of the unmet condition.
Thinking that probably this PR just makes sense really to merge together with the "depends_on" for managed resources. In standalone mode it does not give that much. Also it might change because of that. See: #850 |
will close this, start over again from scratch based on updated design. |
No description provided.