-
Notifications
You must be signed in to change notification settings - Fork 218
Using patch when updating status sub-resource (or update without optimistic locking) #525
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Just from my perspective doing this with Go and controller-runtime for a few years, I think we need to be careful about adding library code that encourages operator authors to write to a primary status block and then expect to read from that same status to understand state. The example brought up where this kind of pattern seems useful is when a reconciler creates some off cluster resource and doesn't get a reference to it until after it has already been created. Naively, it seems like storing that reference in the primary object's status is perfect because the assumption is that next reconciliation will see it and decide to not try and create it again. A better alternative is to define secondary resource types that represent off-cluster objects. Then when a reconciler is running, it can use a label selector to find these secondary resources. Not coincidentally, this is generally how workload controllers keep track of their underlying pods (e.g. replica sets use their In general, the pattern all controllers should follow (and by extension SDK code should enshrine is):
In this case, the off-cluster state would be represented as a secondary object's spec. A nice consequence of this design is that you could write a completely separate controller that keeps the status of the off-cluster resource up-to-date. Using the example of a GitHub PR, a separate controller could open a watch connection with github (or poll if watching isn't supported) to update the PR resource with information about the PR status (open/merged/closed, checks passing or failing, labels, assignees, etc.). This design encourages separation of concerns and makes individual reconcilers more composable and less complex. |
@joelanford thank you very much for this feedback! This is pure gold. I've just realizes that we are mixing actually two things in this issue:
(1) Regardless if we manage state or not in status (I'm convinced now that it is a smell) , it can be updated by multiple processes. This is why we use optimistic locking, to prevent lost updates. See also as discussed here before: https://kubernetes.slack.com/archives/CAW0GV7A5/p1619703141379800 IMHO still makes sense to do this using patch in general, if different processes are updating different fields in status then it's better since won't have conflicts, will have better throughput. Also will prevent unnecessary execution of reconciliation loops in some situations, see issue: #409 - this might be specific to JOSDK handles automatic retries if an exception is thrown from the controller. As suggested in that discussion if there is a real conflict, in the sense that there is other process than the controller making changes on same attribute of status, we could just have "last writer win" (well depending on the domain this might be or might not be acceptable.) (2) Just one note: In the past we typically implemented such a system where we had it actually this way, so having a main controller and lot's of external resources managed by other (sub-)controllers. And the main controller was watching for changes of the dependent custom resources. It was about ~4-5 simple dependent resources. At the end we had 5-6 controllers. We wanted to do it that time with nice "micro-controller" way :) . But this also created a lot's of overhead, at the end we evaluated it that breaking it down to multiple controllers was much more complex then having a simple implementation and making this separation concerns on code level (not involving additional controller and custom resources) for a single controller. But then we had to store the state somewhere else...so status seemed to be a simple naive alternative :) |
Patch has been added lately, closing this issue. |
Currently when we update status subresource of a custom resource we do optimistic locking. So if there was an update on custom resource while the controller was reconciliation, the resource version of CR will be changed and consequently the update of the status will fail on conflic.
In case of a successful reconciliation we would like to update the status, even if the custom resource spec is changed meanwhile (note that even a metadata change result in a resourceVersion update). The reason is that we don't want to lost information on the managed (dependent) resource, typically on created ones:
Consider the situation when you create a resource in the controller what can be later only adressed by it's generated ID (from the response of an API call). (Cannot think of better example now (TODO): like a PullRequest in github api). The ID becomes for us a state, this is how we address the resource from now when again reconciling. If we would not store this ID in status, we would have to recreate it on the next run of the controller. So event a spec is changed, this might or might not have effect on the jus created resource. On next run of the controller will reconcile it. But without storing the ID it will be just created and mainly the previouse one not cleaned up.
So there are two ways to go:
Use patch on status - it by definition does not use optimistic locking, it's very generic. Note that under the hood k8s supports both well known json patch standards: https://kubernetes.io/docs/tasks/manage-kubernetes-objects/update-api-object-kubectl-patch/#use-a-json-merge-patch-to-update-a-deployment
Update status without optimistic locking - this will overwrite status event the resource version changed.
Pathching is good because in some cases it might happen that multiple processes want to change the status, so concurrent patches might not result in lost update if implemented correctly. If there are no multiple processes changing the status, simple update is also sufficient.
Probably it's enough to support only Patch.
The text was updated successfully, but these errors were encountered: