Skip to content

Using patch when updating status sub-resource (or update without optimistic locking) #525

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
csviri opened this issue Sep 2, 2021 · 3 comments · Fixed by #1147
Closed
Labels
feature kind/feature Categorizes issue or PR as related to a new feature.

Comments

@csviri
Copy link
Collaborator

csviri commented Sep 2, 2021

Currently when we update status subresource of a custom resource we do optimistic locking. So if there was an update on custom resource while the controller was reconciliation, the resource version of CR will be changed and consequently the update of the status will fail on conflic.

In case of a successful reconciliation we would like to update the status, even if the custom resource spec is changed meanwhile (note that even a metadata change result in a resourceVersion update). The reason is that we don't want to lost information on the managed (dependent) resource, typically on created ones:

Consider the situation when you create a resource in the controller what can be later only adressed by it's generated ID (from the response of an API call). (Cannot think of better example now (TODO): like a PullRequest in github api). The ID becomes for us a state, this is how we address the resource from now when again reconciling. If we would not store this ID in status, we would have to recreate it on the next run of the controller. So event a spec is changed, this might or might not have effect on the jus created resource. On next run of the controller will reconcile it. But without storing the ID it will be just created and mainly the previouse one not cleaned up.

So there are two ways to go:

  1. Use patch on status - it by definition does not use optimistic locking, it's very generic. Note that under the hood k8s supports both well known json patch standards: https://kubernetes.io/docs/tasks/manage-kubernetes-objects/update-api-object-kubectl-patch/#use-a-json-merge-patch-to-update-a-deployment

  2. Update status without optimistic locking - this will overwrite status event the resource version changed.

Pathching is good because in some cases it might happen that multiple processes want to change the status, so concurrent patches might not result in lost update if implemented correctly. If there are no multiple processes changing the status, simple update is also sufficient.

Probably it's enough to support only Patch.

@csviri csviri changed the title Updating Status sub-resource with Patch (or without optimistic locking) Using patch when updating status sub-resource with (or without optimistic locking) Sep 2, 2021
@csviri csviri changed the title Using patch when updating status sub-resource with (or without optimistic locking) Using patch when updating status sub-resource (or update without optimistic locking) Sep 2, 2021
@joelanford
Copy link
Member

Just from my perspective doing this with Go and controller-runtime for a few years, I think we need to be careful about adding library code that encourages operator authors to write to a primary status block and then expect to read from that same status to understand state.

The example brought up where this kind of pattern seems useful is when a reconciler creates some off cluster resource and doesn't get a reference to it until after it has already been created. Naively, it seems like storing that reference in the primary object's status is perfect because the assumption is that next reconciliation will see it and decide to not try and create it again.

A better alternative is to define secondary resource types that represent off-cluster objects. Then when a reconciler is running, it can use a label selector to find these secondary resources. Not coincidentally, this is generally how workload controllers keep track of their underlying pods (e.g. replica sets use their spec.selector to locate pods that are part of the replica set, and they don't record pod references in their status).

In general, the pattern all controllers should follow (and by extension SDK code should enshrine is):

  1. Read desired state from object spec
  2. Read current state from cluster
  3. Apply changes to move towards desired state
  4. Write object status with state of reconciliation

In this case, the off-cluster state would be represented as a secondary object's spec. A nice consequence of this design is that you could write a completely separate controller that keeps the status of the off-cluster resource up-to-date.

Using the example of a GitHub PR, a separate controller could open a watch connection with github (or poll if watching isn't supported) to update the PR resource with information about the PR status (open/merged/closed, checks passing or failing, labels, assignees, etc.).

This design encourages separation of concerns and makes individual reconcilers more composable and less complex.

@csviri
Copy link
Collaborator Author

csviri commented Sep 2, 2021

@joelanford thank you very much for this feedback! This is pure gold.

I've just realizes that we are mixing actually two things in this issue:

  1. the way we are updating status - basically update with optimistic locking versus patch. (this should the main topic)
  2. managing state after reconciliation (probably just popped up because the example I gave)

(1) Regardless if we manage state or not in status (I'm convinced now that it is a smell) , it can be updated by multiple processes. This is why we use optimistic locking, to prevent lost updates. See also as discussed here before:

https://kubernetes.slack.com/archives/CAW0GV7A5/p1619703141379800

IMHO still makes sense to do this using patch in general, if different processes are updating different fields in status then it's better since won't have conflicts, will have better throughput. Also will prevent unnecessary execution of reconciliation loops in some situations, see issue: #409 - this might be specific to JOSDK handles automatic retries if an exception is thrown from the controller.

As suggested in that discussion if there is a real conflict, in the sense that there is other process than the controller making changes on same attribute of status, we could just have "last writer win" (well depending on the domain this might be or might not be acceptable.)

(2) Just one note: In the past we typically implemented such a system where we had it actually this way, so having a main controller and lot's of external resources managed by other (sub-)controllers. And the main controller was watching for changes of the dependent custom resources. It was about ~4-5 simple dependent resources. At the end we had 5-6 controllers. We wanted to do it that time with nice "micro-controller" way :) . But this also created a lot's of overhead, at the end we evaluated it that breaking it down to multiple controllers was much more complex then having a simple implementation and making this separation concerns on code level (not involving additional controller and custom resources) for a single controller. But then we had to store the state somewhere else...so status seemed to be a simple naive alternative :)

@jmrodri jmrodri added kind/feature Categorizes issue or PR as related to a new feature. feature labels Sep 9, 2021
@csviri
Copy link
Collaborator Author

csviri commented Apr 19, 2022

Patch has been added lately, closing this issue.
See: #1147

@csviri csviri closed this as completed Apr 19, 2022
@csviri csviri linked a pull request Apr 19, 2022 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature kind/feature Categorizes issue or PR as related to a new feature.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants