Skip to content

Depends On and Conditions to describe workflows Dependent Resources #850

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
csviri opened this issue Jan 20, 2022 · 10 comments · Fixed by #1153
Closed

Depends On and Conditions to describe workflows Dependent Resources #850

csviri opened this issue Jan 20, 2022 · 10 comments · Fixed by #1153

Comments

@csviri
Copy link
Collaborator

csviri commented Jan 20, 2022

Problem statement

  1. If for a resource we need an input what is an output (or status in K8S of the other resource). (Ideally these would be detected automatically? probably that is not feasable)
  2. When we want to make sure that something happens after an other resource is ready. Like if database is deployed, and want to create a specific default schema.

Solution

Both of these issues, can be addressed by with notions.

  1. dependsOn - If a dependent resource B depends on an other dependent A, this will ensure that B will be reconciled after A is ready (see optional ReadyCondition).
  2. ReconcilePrecondition is optional for a dependent, if available it is evaluated if a resource should be reconciled or not.
  3. ReadyPostcondition is optional, is provided the dependent is considered reconciled if the certain state reached. The condition check is async.
  4. DeletePostcondition - on cleanup (backwards workflow) checks if the resource is deleted successfully (or delete completed) , or the cleanup execution should be re-scheduled (the finalizer not removed).
@ControllerConfiguration( ... ,
    dependents = {
       @Dependent(name="dep1", type = DeploymentDependentResource.class, readyPostcondition = ReadyCondition.class )
       @Dependent(name="ser1", type = ServiceDependentResource.class, reconcilePrecondition = MyCondition.class),
       @Dependent(type = OtherResource.class,  reconcilePrecondition = OtherCondition.class,
              deletePostcondition = DeleteCondition.class , dependsOn = {"dep1","ser1"}
         })
   })
 public class MyResourceReconciler implements Reconciler<MyResource>, EventSourceInitializer<MyResource> {
   ....
 }

Behavior Details

  • If a dependent resource B depends on A, but A has a ReconcilePrecondition is evaluated as false, then B will not be reconciled either.
  • If a ReconcilePrecondition is evaluated false on A dependent resource it is a Deleter, delete should be called on it. And all the resources which are dependent on A and are deleters should be deleted. This should work in a transitive manner too ( in reveres order, so the resource with reconcile condition closer to the top is deleted later). (See: https://github.com/java-operator-sdk/java-operator-sdk/blob/4d63e1260efeb70ac7550d52e48117bf8982fabb/sample-operators/webpage/src/main/java/io/javaoperatorsdk/operator/sample/WebPageStandaloneDependentsReconciler.java#L68-L72 )
  • if ReadyPostcondition can define an UpdateControl to return if the condition is now reached, this means that effectively wait is always async. It can also defined time delay to reschedule the reconciliation.
  • For cleanup (TBD) - we could dry run the actual workflow (evaulating the conditions), and execute deleters backwards, also taking in mind delete post conditions (this might be an issue if dependent changes just parallel with the CR deletion). - Alternatively, what might a more bulletproof way is just based on the backwards order red the state of all the resources defined, and delete them if present.
  • Eventually a possible concurrency option could be implemented (maybe think of it from early design). Not dependent resources could be reconciled concurrently.

Sync and Async Waiting Condition

Note that waiting for a state condition can happen synchronously or asynchronously, thus if we know that we will wait for deployment quite a long time to be ready, the scheduling algirthm can just exit the reonciliation and schedule a new reconciliation for a specified delay (UpdateControl.noUpdate().rescheduleAfter(2, TimeUnit.MINUTES) if the condition not holds at the moment.

UPDATE: we discussed, for now and sync wait will be out of scope

Notes

  1. Not that without cycles the result is a DAG, what can be nicely scheduled for execution, more precisely a set of DAGs.
  2. The cycle detection will be needed, to help developers detect cycles as early as possible.
  3. This is an implementation detail, but 1 thing to think about is actually how to manage state, so if it's not in the status but a ConfigMap/Secret/CR the execution of two dependent resource reconciliation in the DAG cannot be parallel. Or at least not of those which update the state, so there are no conflicts. Maybe those should be marked explicitly as an improvement later?
  4. DeletePostcondition makes sense only for Deleter dependent resources.

To Discuss

  1. DeletePostcondition could be replaced by returning a boolean from delete of Deleter.
@metacosm
Copy link
Collaborator

1. If for a resource we need an input what is an output (or status in K8S of the other resource). (Ideally these would be detected automatically? like in Terraform)

How could they be detected automatically? This processing needs to happen at build time so unless you perform static analysis I don't see how it would be possible…

@csviri
Copy link
Collaborator Author

csviri commented Jan 20, 2022

yep, don't think it's practically possible, without explicitly stating them anyways. So the best what can be easily done is the depends on. Will update the description.

@andreaTP
Copy link
Collaborator

This is a good starting point to model state transitions!

Something that I'm not sure about is if it should be modelled as an Annotation at the Reconciler class level, are you going to not run the reconciling loop if the condition is not met?
This approach can probably lead to a number of edge cases ...

Another comment is around Conditions, it would be ideal if they can be more abstract than just "K8s conditions" but even data/content changes (for example monitoring the change of the revision or of the actual content).

I think that a valid almost real-world use case to test this feature against could be:

  • [user] writes a CR referencing a Secret
  • [reconciler] the creation of an associated Deployment doesn't start until the Secret is created (shows a message in the status?)
  • [user] creates the mentioned Secret
  • [reconciler] creates the associated Deployment
  • [user] updates the Secret
  • [reconciler] recognize that the Secret changed and performs a rolling restart once

NOTE1: I understand that the first Deployment can be created independently from the Secret, but this serves as a test bench for this feature

NOTE2: Part of this feature is offered by some popular operators (e.g. Reloader) but is an arguably common task/pattern for operators and it would be nice if the SDK can take care of those details.

I hope this comment doesn't go too much beyond the original scope of this issue.

@csviri
Copy link
Collaborator Author

csviri commented Jan 21, 2022

Something that I'm not sure about is if it should be modelled as an Annotation at the Reconciler class level, are you going to not run the reconciling loop if the condition is not met?

I guess we won't, that mean the resource A is somehow make sense to reconcile if the resource B is in a certain condition.

Another comment is around Conditions, it would be ideal if they can be more abstract than just "K8s conditions" but even data/content changes (for example monitoring the change of the revision or of the actual content).

Agree, probably needs to be the target resource but maybe the whole context accessible from a condition API.

I think that a valid almost real-world use case to test this feature against could be

This use case would be perfectly covered with this design. We can create an e2e test for this.

The use case you mentioned on community meeting, that there might be a job that is created and executed only once, and deleted is I think completely covered also with this: #851
But we can discuss that further.

Than you very much for feedback @andreaTP!

@andreaTP
Copy link
Collaborator

I guess we won't, that mean the resource A is somehow make sense to reconcile if the resource B is in a certain condition.

I might be missing something, but I think that resource A reconciler should be triggered anyhow, with appropriate parameter/s(ready = false or something), even if resource B haven't reached the specific condition.
This will enable:

  • waiting for multiple conditions
  • monitoring the status (and eventually reporting it in the Status field)
  • handle possible recovery actions (timeouts, retries etc.)

@andreaTP
Copy link
Collaborator

This use case would be perfectly covered with this design. We can create an e2e test for this.

This would be great ❤️

@csviri
Copy link
Collaborator Author

csviri commented Apr 3, 2022

Updated the design also added description. There can be now a Condition on any resources. The dependsOn will mean that the reconciliation is happening after the reconciliation of other resources - also with possible WaitCondition .

(
Note that this is now consistent with the model in Terraform (or CloudFormation and others) adjusted for Kubernetes needs, see https://www.terraform.io/language/meta-arguments/depends_on
https://stackoverflow.com/questions/60231309/terraform-conditional-creation-of-a-resource-based-on-a-variable-in-tfvars
)

@csviri csviri changed the title dependsOn Construct for Dependent Resources dependsOn and Conditions to describe workflows Dependent Resources Apr 3, 2022
@csviri csviri changed the title dependsOn and Conditions to describe workflows Dependent Resources Depends On and Conditions to describe workflows Dependent Resources Apr 3, 2022
@csviri
Copy link
Collaborator Author

csviri commented Apr 13, 2022

Regarding exception handling but also for waitCondition and cleanupCondition, there is an interesting design option: being fail fast or not.

In other works if a reconciliation of a resource fails, but other independent resources might be still reconciled, should we proceed with them?

Same for wait and cleanup conditions, if a wait condition does not met, now reconciler can exit immediately (of course wait for already running reconciliations to finish - since execution is parallel). Or still reconcile the the independent resources. And just return with the update control that is specified with the first wait condition not met.

In the long terms, since the workflow can be executed concurrently (for independent resources), it might make sense to not fail fails. So the time to the target state is shorter. Assuming that some resources (like Deployments) might take time to get into ready state.

@csviri
Copy link
Collaborator Author

csviri commented Apr 13, 2022

An additional detail described here:
#1150

So at the end it seems all dependent resources will provide an event source, more precisely a ResourceEventSource.
That also beneficial here, because for the Deleters it can be decided locally if a delete (so the actual API call) should be made or not based on local cache of the event source. What in a repeated delete makes it much more efficient.

@csviri
Copy link
Collaborator Author

csviri commented Apr 13, 2022

UPDATE: WaitCondition is removed, instead added ReadyCondition on resource.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants