Skip to content

Leader Election #1358

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 31 commits into from
Aug 24, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
31 commits
Select commit Hold shift + click to select a range
e0f3009
feat: leader election
csviri Jul 19, 2022
5a19f86
refactor: event processor to controller
csviri Jul 21, 2022
9837ccc
wip
csviri Jul 21, 2022
f035fb3
wip
csviri Jul 22, 2022
3f3923c
migrate to v6
csviri Jul 25, 2022
dc3f2a9
leader election impl
csviri Aug 1, 2022
866f06a
leader election IT
csviri Aug 1, 2022
4edd392
format
csviri Aug 1, 2022
53cd42f
e2e test
csviri Aug 3, 2022
5a1cdff
e2e improvements
csviri Aug 3, 2022
bb4eb9d
pod startup timeout
csviri Aug 3, 2022
f0fddf3
pod
csviri Aug 3, 2022
f81be31
image pull policy
csviri Aug 3, 2022
382cfd1
disable test for local mode
csviri Aug 3, 2022
c2d2211
docs
csviri Aug 3, 2022
d29f51b
fixes on rebase
csviri Aug 5, 2022
1e4f64c
docs: improve wording
metacosm Aug 22, 2022
6361b36
fix: match other methods' usage pattern
metacosm Aug 23, 2022
440bf0f
refactor: make sure all constructors cascade
metacosm Aug 23, 2022
0cd2a90
docs: improve wording
metacosm Aug 23, 2022
93a260a
refactor: move identity generation to configuration
metacosm Aug 23, 2022
b09bf5c
refactor: isLeaderElectionOn -> isLeaderElectionEnabled
metacosm Aug 23, 2022
951d28d
refactor: restore LifecycleAware compatibility
metacosm Aug 23, 2022
a2b79d8
Revert "refactor: move identity generation to configuration"
csviri Aug 24, 2022
df6f2ef
removed builder
csviri Aug 24, 2022
e7a80ee
no builder in e2e test
csviri Aug 24, 2022
f8e2211
put back test constraint
csviri Aug 24, 2022
5dbef65
improve expressiveness on constat
csviri Aug 24, 2022
aec94c2
update stat with optimistic locking
csviri Aug 24, 2022
e58457f
fix config override sequence issue
csviri Aug 24, 2022
24e2e3f
added back test run condition
csviri Aug 24, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .github/workflows/e2e-test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@ jobs:
- "sample-operators/mysql-schema"
- "sample-operators/tomcat-operator"
- "sample-operators/webpage"
- "sample-operators/leader-election"
runs-on: ubuntu-latest
steps:
- name: Checkout
Expand Down
13 changes: 13 additions & 0 deletions docs/documentation/features.md
Original file line number Diff line number Diff line change
Expand Up @@ -683,6 +683,19 @@ See also
the [integration test](https://github.com/java-operator-sdk/java-operator-sdk/blob/ec37025a15046d8f409c77616110024bf32c3416/operator-framework/src/test/java/io/javaoperatorsdk/operator/sample/changenamespace/ChangeNamespaceTestReconciler.java)
for this feature.

## Leader Election

Operators are generally deployed with a single running or active instance. However, it is
possible to deploy multiple instances in such a way that only one, called the "leader", processes the
events. This is achieved via a mechanism called "leader election". While all the instances are
running, and even start their event sources to populate the caches, only the leader will process
the events. This means that should the leader change for any reason, for example because it
crashed, the other instances are already warmed up and ready to pick up where the previous
leader left off should one of them become elected leader.

See sample configuration in the [E2E test](https://github.com/java-operator-sdk/java-operator-sdk/blob/144947d89323f1c65de6e86bd8b9a6a8ffe714ff/sample-operators/leader-election/src/main/java/io/javaoperatorsdk/operator/sample/LeaderElectionTestOperator.java#L26-L30)
.

## Monitoring with Micrometer

## Automatic Generation of CRDs
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,81 @@
package io.javaoperatorsdk.operator;

import java.util.Collection;
import java.util.HashMap;
import java.util.Map;
import java.util.Optional;

import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import io.javaoperatorsdk.operator.processing.Controller;

/**
* Not to be confused with the controller manager concept from Go's controller-runtime project. In
* JOSDK, the equivalent concept is {@link Operator}.
*/
class ControllerManager {

private static final Logger log = LoggerFactory.getLogger(ControllerManager.class);

@SuppressWarnings("rawtypes")
private final Map<String, Controller> controllers = new HashMap<>();
private boolean started = false;

public synchronized void shouldStart() {
if (started) {
return;
}
if (controllers.isEmpty()) {
throw new OperatorException("No Controller exists. Exiting!");
}
}

public synchronized void start(boolean startEventProcessor) {
controllers().parallelStream().forEach(c -> c.start(startEventProcessor));
started = true;
}

public synchronized void stop() {
controllers().parallelStream().forEach(closeable -> {
log.debug("closing {}", closeable);
closeable.stop();
});
started = false;
}

public synchronized void startEventProcessing() {
controllers().parallelStream().forEach(Controller::startEventProcessing);
}

@SuppressWarnings({"unchecked", "rawtypes"})
synchronized void add(Controller controller) {
final var configuration = controller.getConfiguration();
final var resourceTypeName = ReconcilerUtils
.getResourceTypeNameWithVersion(configuration.getResourceClass());
final var existing = controllers.get(resourceTypeName);
if (existing != null) {
throw new OperatorException("Cannot register controller '" + configuration.getName()
+ "': another controller named '" + existing.getConfiguration().getName()
+ "' is already registered for resource '" + resourceTypeName + "'");
}
controllers.put(resourceTypeName, controller);
}

@SuppressWarnings("rawtypes")
synchronized Optional<Controller> get(String name) {
return controllers().stream()
.filter(c -> name.equals(c.getConfiguration().getName()))
.findFirst();
}

@SuppressWarnings("rawtypes")
synchronized Collection<Controller> controllers() {
return controllers.values();
}

synchronized int size() {
return controllers.size();
}
}

Original file line number Diff line number Diff line change
@@ -0,0 +1,85 @@
package io.javaoperatorsdk.operator;

import java.util.UUID;
import java.util.concurrent.CompletableFuture;

import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import io.fabric8.kubernetes.client.KubernetesClient;
import io.fabric8.kubernetes.client.extended.leaderelection.LeaderCallbacks;
import io.fabric8.kubernetes.client.extended.leaderelection.LeaderElectionConfig;
import io.fabric8.kubernetes.client.extended.leaderelection.LeaderElector;
import io.fabric8.kubernetes.client.extended.leaderelection.LeaderElectorBuilder;
import io.fabric8.kubernetes.client.extended.leaderelection.resourcelock.LeaseLock;
import io.fabric8.kubernetes.client.extended.leaderelection.resourcelock.Lock;
import io.javaoperatorsdk.operator.api.config.ConfigurationServiceProvider;
import io.javaoperatorsdk.operator.api.config.LeaderElectionConfiguration;

public class LeaderElectionManager {

private static final Logger log = LoggerFactory.getLogger(LeaderElectionManager.class);

private LeaderElector leaderElector = null;
private final ControllerManager controllerManager;
private String identity;
private CompletableFuture<?> leaderElectionFuture;

public LeaderElectionManager(ControllerManager controllerManager) {
this.controllerManager = controllerManager;
}

public void init(LeaderElectionConfiguration config, KubernetesClient client) {
this.identity = identity(config);
Lock lock = new LeaseLock(config.getLeaseNamespace(), config.getLeaseName(), identity);
// releaseOnCancel is not used in the underlying implementation
leaderElector = new LeaderElectorBuilder(client,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This API is quite awkward, imo… maybe we should improve it in the Fabric8 client project?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, yes, this could be better there.

ConfigurationServiceProvider.instance().getExecutorService())
.withConfig(
new LeaderElectionConfig(lock, config.getLeaseDuration(), config.getRenewDeadline(),
config.getRetryPeriod(), leaderCallbacks(), true, config.getLeaseName()))
.build();
}

public boolean isLeaderElectionEnabled() {
return leaderElector != null;
}

private LeaderCallbacks leaderCallbacks() {
return new LeaderCallbacks(this::startLeading, this::stopLeading, leader -> {
log.info("New leader with identity: {}", leader);
});
}

private void startLeading() {
controllerManager.startEventProcessing();
}

private void stopLeading() {
log.info("Stopped leading for identity: {}. Exiting.", identity);
// When leader stops leading the process ends immediately to prevent multiple reconciliations
// running parallel.
// Note that some reconciliations might run for a very long time.
System.exit(1);
}

private String identity(LeaderElectionConfiguration config) {
String id = config.getIdentity().orElse(System.getenv("HOSTNAME"));
if (id == null || id.isBlank()) {
id = UUID.randomUUID().toString();
}
return id;
}

public void start() {
if (isLeaderElectionEnabled()) {
leaderElectionFuture = leaderElector.start();
}
}

public void stop() {
if (leaderElectionFuture != null) {
leaderElectionFuture.cancel(false);
}
}
}
Loading