Depends On and Conditions to describe workflows Dependent Resources #850

csviri · 2022-01-20T14:14:48Z

Problem statement

If for a resource we need an input what is an output (or status in K8S of the other resource). (Ideally these would be detected automatically? probably that is not feasable)
When we want to make sure that something happens after an other resource is ready. Like if database is deployed, and want to create a specific default schema.

Solution

Both of these issues, can be addressed by with notions.

dependsOn - If a dependent resource B depends on an other dependent A, this will ensure that B will be reconciled after A is ready (see optional ReadyCondition).
ReconcilePrecondition is optional for a dependent, if available it is evaluated if a resource should be reconciled or not.
ReadyPostcondition is optional, is provided the dependent is considered reconciled if the certain state reached. The condition check is async.
DeletePostcondition - on cleanup (backwards workflow) checks if the resource is deleted successfully (or delete completed) , or the cleanup execution should be re-scheduled (the finalizer not removed).

@ControllerConfiguration( ... ,
    dependents = {
       @Dependent(name="dep1", type = DeploymentDependentResource.class, readyPostcondition = ReadyCondition.class )
       @Dependent(name="ser1", type = ServiceDependentResource.class, reconcilePrecondition = MyCondition.class),
       @Dependent(type = OtherResource.class,  reconcilePrecondition = OtherCondition.class,
              deletePostcondition = DeleteCondition.class , dependsOn = {"dep1","ser1"}
         })
   })
 public class MyResourceReconciler implements Reconciler<MyResource>, EventSourceInitializer<MyResource> {
   ....
 }

Behavior Details

If a dependent resource B depends on A, but A has a ReconcilePrecondition is evaluated as false, then B will not be reconciled either.
If a ReconcilePrecondition is evaluated false on A dependent resource it is a Deleter, delete should be called on it. And all the resources which are dependent on A and are deleters should be deleted. This should work in a transitive manner too ( in reveres order, so the resource with reconcile condition closer to the top is deleted later). (See: https://github.com/java-operator-sdk/java-operator-sdk/blob/4d63e1260efeb70ac7550d52e48117bf8982fabb/sample-operators/webpage/src/main/java/io/javaoperatorsdk/operator/sample/WebPageStandaloneDependentsReconciler.java#L68-L72 )
if ReadyPostcondition can define an UpdateControl to return if the condition is now reached, this means that effectively wait is always async. It can also defined time delay to reschedule the reconciliation.
For cleanup (TBD) - we could dry run the actual workflow (evaulating the conditions), and execute deleters backwards, also taking in mind delete post conditions (this might be an issue if dependent changes just parallel with the CR deletion). - Alternatively, what might a more bulletproof way is just based on the backwards order red the state of all the resources defined, and delete them if present.
Eventually a possible concurrency option could be implemented (maybe think of it from early design). Not dependent resources could be reconciled concurrently.

Sync and Async Waiting Condition

Note that waiting for a state condition can happen synchronously or asynchronously, thus if we know that we will wait for deployment quite a long time to be ready, the scheduling algirthm can just exit the reonciliation and schedule a new reconciliation for a specified delay (UpdateControl.noUpdate().rescheduleAfter(2, TimeUnit.MINUTES) if the condition not holds at the moment.

UPDATE: we discussed, for now and sync wait will be out of scope

Notes

Not that without cycles the result is a DAG, what can be nicely scheduled for execution, more precisely a set of DAGs.
The cycle detection will be needed, to help developers detect cycles as early as possible.
This is an implementation detail, but 1 thing to think about is actually how to manage state, so if it's not in the status but a ConfigMap/Secret/CR the execution of two dependent resource reconciliation in the DAG cannot be parallel. Or at least not of those which update the state, so there are no conflicts. Maybe those should be marked explicitly as an improvement later?
DeletePostcondition makes sense only for Deleter dependent resources.

To Discuss

DeletePostcondition could be replaced by returning a boolean from delete of Deleter.

The text was updated successfully, but these errors were encountered:

metacosm · 2022-01-20T14:31:35Z

1. If for a resource we need an input what is an output (or status in K8S of the other resource). (Ideally these would be detected automatically? like in Terraform)

How could they be detected automatically? This processing needs to happen at build time so unless you perform static analysis I don't see how it would be possible…

csviri · 2022-01-20T14:33:13Z

yep, don't think it's practically possible, without explicitly stating them anyways. So the best what can be easily done is the depends on. Will update the description.

andreaTP · 2022-01-21T14:18:04Z

This is a good starting point to model state transitions!

Something that I'm not sure about is if it should be modelled as an Annotation at the Reconciler class level, are you going to not run the reconciling loop if the condition is not met?
This approach can probably lead to a number of edge cases ...

Another comment is around Conditions, it would be ideal if they can be more abstract than just "K8s conditions" but even data/content changes (for example monitoring the change of the revision or of the actual content).

I think that a valid almost real-world use case to test this feature against could be:

[user] writes a CR referencing a Secret
[reconciler] the creation of an associated Deployment doesn't start until the Secret is created (shows a message in the status?)
[user] creates the mentioned Secret
[reconciler] creates the associated Deployment
[user] updates the Secret
[reconciler] recognize that the Secret changed and performs a rolling restart once

NOTE1: I understand that the first Deployment can be created independently from the Secret, but this serves as a test bench for this feature

NOTE2: Part of this feature is offered by some popular operators (e.g. Reloader) but is an arguably common task/pattern for operators and it would be nice if the SDK can take care of those details.

I hope this comment doesn't go too much beyond the original scope of this issue.

csviri · 2022-01-21T14:59:14Z

Something that I'm not sure about is if it should be modelled as an Annotation at the Reconciler class level, are you going to not run the reconciling loop if the condition is not met?

I guess we won't, that mean the resource A is somehow make sense to reconcile if the resource B is in a certain condition.

Another comment is around Conditions, it would be ideal if they can be more abstract than just "K8s conditions" but even data/content changes (for example monitoring the change of the revision or of the actual content).

Agree, probably needs to be the target resource but maybe the whole context accessible from a condition API.

I think that a valid almost real-world use case to test this feature against could be

This use case would be perfectly covered with this design. We can create an e2e test for this.

The use case you mentioned on community meeting, that there might be a job that is created and executed only once, and deleted is I think completely covered also with this: #851
But we can discuss that further.

Than you very much for feedback @andreaTP!

andreaTP · 2022-01-21T15:14:54Z

I guess we won't, that mean the resource A is somehow make sense to reconcile if the resource B is in a certain condition.

I might be missing something, but I think that resource A reconciler should be triggered anyhow, with appropriate parameter/s(ready = false or something), even if resource B haven't reached the specific condition.
This will enable:

waiting for multiple conditions
monitoring the status (and eventually reporting it in the Status field)
handle possible recovery actions (timeouts, retries etc.)

andreaTP · 2022-01-21T15:16:07Z

This use case would be perfectly covered with this design. We can create an e2e test for this.

This would be great ❤️

csviri · 2022-04-03T14:08:35Z

Updated the design also added description. There can be now a Condition on any resources. The dependsOn will mean that the reconciliation is happening after the reconciliation of other resources - also with possible WaitCondition .

(
Note that this is now consistent with the model in Terraform (or CloudFormation and others) adjusted for Kubernetes needs, see https://www.terraform.io/language/meta-arguments/depends_on
https://stackoverflow.com/questions/60231309/terraform-conditional-creation-of-a-resource-based-on-a-variable-in-tfvars
)

csviri · 2022-04-13T07:51:17Z

Regarding exception handling but also for waitCondition and cleanupCondition, there is an interesting design option: being fail fast or not.

In other works if a reconciliation of a resource fails, but other independent resources might be still reconciled, should we proceed with them?

Same for wait and cleanup conditions, if a wait condition does not met, now reconciler can exit immediately (of course wait for already running reconciliations to finish - since execution is parallel). Or still reconcile the the independent resources. And just return with the update control that is specified with the first wait condition not met.

In the long terms, since the workflow can be executed concurrently (for independent resources), it might make sense to not fail fails. So the time to the target state is shorter. Assuming that some resources (like Deployments) might take time to get into ready state.

csviri · 2022-04-13T08:05:53Z

An additional detail described here:
#1150

So at the end it seems all dependent resources will provide an event source, more precisely a ResourceEventSource.
That also beneficial here, because for the Deleters it can be decided locally if a delete (so the actual API call) should be made or not based on local cache of the event source. What in a repeated delete makes it much more efficient.

csviri · 2022-04-13T10:29:42Z

UPDATE: WaitCondition is removed, instead added ReadyCondition on resource.

csviri added the dependent-resources-epic label Jan 20, 2022

csviri self-assigned this Jan 20, 2022

csviri mentioned this issue Jan 20, 2022

Custom action for dependent resources #851

Closed

csviri added the feature label Jan 20, 2022

csviri mentioned this issue Jan 21, 2022

Architectural Decision/Discussion: Layering Architecture Regarding Dependent Resources #858

Closed

csviri mentioned this issue Feb 22, 2022

Accessing Other Dependent Resources for a Managed Dependent Resource #964

Closed

scrocquesel mentioned this issue Feb 24, 2022

Discussion around a Conditions api abstraction #971

Closed

This was referenced Mar 4, 2022

Support for wait for a resource state condition #995

Closed

WIP: feat: depends on wait condition design #994

Closed

csviri mentioned this issue Apr 3, 2022

Make for a dependent to decide whether the associated resource should be created dynamically #1116

Closed

csviri added the workflows label Apr 3, 2022

csviri mentioned this issue Apr 3, 2022

Workflows for Dependent Resource - Umbrella Issue #1097

Closed

6 tasks

csviri changed the title ~~dependsOn Construct for Dependent Resources~~ dependsOn and Conditions to describe workflows Dependent Resources Apr 3, 2022

csviri changed the title ~~dependsOn and Conditions to describe workflows Dependent Resources~~ Depends On and Conditions to describe workflows Dependent Resources Apr 3, 2022

This was referenced Apr 5, 2022

The Deleter and Garbage Collector issuer with Kuberentes Dependent Resources #1127

Closed

Creating doc with basic patterns in controllers we should consider for architecture #1095

Closed

csviri linked a pull request Apr 13, 2022 that will close this issue

Workflow engine implementation #1153

Merged

csviri mentioned this issue Apr 13, 2022

This week spent time on Operator Workflow Engine Design and Implementation snowdrop/team#804

Open

csviri closed this as completed May 26, 2022

metacosm added this to the 3.1 milestone May 31, 2022

This was referenced Jun 2, 2022

DependentResource and Workflow Design Details Questions #1263

Closed

Define workflows with dependent resource annotations #1241

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Depends On and Conditions to describe workflows Dependent Resources #850

Depends On and Conditions to describe workflows Dependent Resources #850

csviri commented Jan 20, 2022 •

edited

Loading

metacosm commented Jan 20, 2022

csviri commented Jan 20, 2022 •

edited

Loading

andreaTP commented Jan 21, 2022

csviri commented Jan 21, 2022 •

edited

Loading

andreaTP commented Jan 21, 2022

andreaTP commented Jan 21, 2022

csviri commented Apr 3, 2022 •

edited

Loading

csviri commented Apr 13, 2022 •

edited

Loading

csviri commented Apr 13, 2022

csviri commented Apr 13, 2022

Depends On and Conditions to describe workflows Dependent Resources #850

Depends On and Conditions to describe workflows Dependent Resources #850

Comments

csviri commented Jan 20, 2022 • edited Loading

Problem statement

Solution

Behavior Details

Sync and Async Waiting Condition

Notes

To Discuss

metacosm commented Jan 20, 2022

csviri commented Jan 20, 2022 • edited Loading

andreaTP commented Jan 21, 2022

csviri commented Jan 21, 2022 • edited Loading

andreaTP commented Jan 21, 2022

andreaTP commented Jan 21, 2022

csviri commented Apr 3, 2022 • edited Loading

csviri commented Apr 13, 2022 • edited Loading

csviri commented Apr 13, 2022

csviri commented Apr 13, 2022

csviri commented Jan 20, 2022 •

edited

Loading

csviri commented Jan 20, 2022 •

edited

Loading

csviri commented Jan 21, 2022 •

edited

Loading

csviri commented Apr 3, 2022 •

edited

Loading

csviri commented Apr 13, 2022 •

edited

Loading