Skip to content

Leader election #411

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
shawkins opened this issue Apr 28, 2021 · 5 comments · Fixed by #1358
Closed

Leader election #411

shawkins opened this issue Apr 28, 2021 · 5 comments · Fixed by #1358
Assignees
Labels
feature kind/feature Categorizes issue or PR as related to a new feature.
Milestone

Comments

@shawkins
Copy link
Collaborator

Related to #409 are there plans to add leader election functionality similar to https://docs.openshift.com/container-platform/4.7/operators/operator_sdk/osdk-leader-election.html to the java operator sdk?

@jmrodri jmrodri added kind/feature Categorizes issue or PR as related to a new feature. feature labels Sep 2, 2021
@csviri csviri self-assigned this Jan 10, 2022
@csviri
Copy link
Collaborator

csviri commented Feb 2, 2022

@csviri csviri added this to the 3.3 milestone Jun 8, 2022
@csviri
Copy link
Collaborator

csviri commented Jun 8, 2022

@shawkins I added this to 3.3 milestone for now.

To summarize it to my understanding there are two cases when running multiple instances of operators is happening and/or desirable, and leader election makes sure only one of them is actively reconciling, thus other than leader instances don't execute reconcilers:

  1. minimize downtime in following cases:
    • There is an updated version of operator being released, and deployment first creates the new version of the operator pod then stops the old one. (For now to handle this scenario use recreate deployment strategy)
    • Minimise downtime of an operator crash, so there are multiple instances running all the time. However there are multiple strategies are this. So if an operator not the leader, should it populate the caches? and just not reconcile the events.
  2. make sure fail-over operators are be provisioned on the cluster, so multiple instance are provisioned, therefore in case the active operator's pod is crashed, it cannot happen that a new instance is not provision because of cluster resources are not available.

In summary there is one design question:

  • Should the non-leader operator instances activate event sources, and just don't trigger reconciliation until elected as leader? Or just basically start the operator when it is elected as leader. Both has pros and cons, when event sources are activated will consume resources (polling in some cases possibly, cache resources in memory), on the other hand will minimize downtime, in case of the syncing the caches on startup takes long time.

@metacosm
Copy link
Collaborator

metacosm commented Jun 8, 2022

Maybe the strategy should be configurable i.e. the framework would support event replication but let users activate or deactivate it depending on their needs?

@csviri
Copy link
Collaborator

csviri commented Jun 9, 2022

Maybe the strategy should be configurable i.e. the framework would support event replication but let users activate or deactivate it depending on their needs?

Yes, a feature flag would be nice for that, agree.

@csviri
Copy link
Collaborator

csviri commented Jun 9, 2022

Just one more note, in both cases, if an operator becomes the leader will need reconcile all the resources anyways. Since there is no info how long the other "leader before" was down.

@csviri csviri modified the milestones: 3.3, 3.2 Jun 20, 2022
@csviri csviri linked a pull request Jul 21, 2022 that will close this issue
metacosm added a commit that referenced this issue Aug 24, 2022
csviri added a commit that referenced this issue Aug 25, 2022
@csviri csviri closed this as completed Aug 26, 2022
csviri added a commit that referenced this issue Aug 30, 2022
csviri added a commit that referenced this issue Sep 5, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature kind/feature Categorizes issue or PR as related to a new feature.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants