Deep Dive: Building a Kubernetes Operator SDK for Java Developers

Operators in a Nutshell

Kubernetes provides an extension mechanism by allowing us to define our own resource types, called custom resources. Running a pod on the cluster that monitors a type of custom resource and manages other resources based on it, is the Operator pattern.

apiVersion: "sample.javaoperatorsdk/v1"
kind: WebPage
metadata:
name: hello-world-page
spec:
html: |
<html>
<head>
<title>Hello Operator World</title>
</head>
<body>
Hellooooo Operators!!
</body>
</html>

Building an SDK

We decided to create the java-operator-sdk to solve certain common problems that every operator written in Java will need to tackle. When we started to work on our first operator, it was really just a thin wrapper around the fabric8 Kubernetes API client. The two main topics that we tackled later are: how to efficiently handle events and how to manage consistency. The problems in these categories (which are not unrelated) are not trivial, but can be solved by a framework. This means developers won’t have to worry about these concerns anymore, and creating an operator would mean just implementing a simple interface.

Event Scheduling

When we watch for events in Kubernetes we receive them serially on a single thread. More precisely all the events for a custom resource type — or a ‘watch’ in Kubernetes terms. We want an efficient multithreaded way to process these events. At the end we defined these three essential requirements for the event-scheduling algorithm:

1. Concurrently processing events for different resources.

A controller execution can take very long. For example, we had controllers that literally took minutes because of slow responses from remote APIs, while incoming events for different resources (of the same type) are not related. That’s why we don’t want to process them one by one in a serial way. When we receive a new event, we schedule it using an executor service with a given thread pool, and make the processing concurrent.

2. Serial event processing for a single custom resource.

On the other hand we don’t want to execute events concurrently for the same resource. Imagine the situation: when we change a custom resource we start execution of a controller for the event, but someone suddenly changes the resource again. This would lead to a new event and a new controller execution. Although this would not lead to a classic race condition in terms of shared memory between the threads, there will be two threads which are calling all kinds of APIs in an arbitrary order. In corner cases, it could even happen that the later event gets processed faster than the first one, resulting in an inconsistent state.

3. Retrying failed controller executions

‘Everything fails, all the time’.

The ‘Generation’ Metadata Attribute

Kubernetes objects have a metadata attribute called ‘generation’. This is a number that increments on every change, excluding changes on .metadata and .status sub-resource. If we receive a new event, but the generation did not increase, we could just ignore that event and should not execute the controller. Remember that, in controller implementation, our main concern should be the .spec part.

Consistency

We are dealing with an event-driven distributed system. The characteristics and guarantees that Kubernetes API provides already help us a lot with event processing, as well as the event scheduling we discussed above solves many issues. We still have to deal with some additional ones to manage consistency:

‘At Least Once’ and Idempotency

The controller implementation should be idempotent. The reason is that we cannot guarantee exactly-once processing of an event—even if retries are turned off. In other words we are dealing with an ‘at least once’ guarantee here.

Deleting Resources and Finalizers

Kubernetes finalizers help us make sure we don’t lose a delete event. Picture a situation where an operator crashes or isn’t running while a custom resource gets deleted. When we start to watch for events, we won’t get delete events for resources deleted earlier. So after the operator starts, we wouldn’t receive the delete event. We would end up in an inconsistent state.

The Lost Update Problem and Optimistic Locking

Custom resources can be updated using the Kubernetes API at any time. In addition to that, at the end of an event processing we might want to write back some changes to the custom resource — meaning the operator will do an update, too.

Run Single Instance

We should always run a single instance (think pod) of an operator. If there were multiple instances of the same operator running (and listening to events from the same set of resources) we would have similar issues like we already discussed in point 2 of the Event Scheduling section. Two processes would receive the same event, so there would be two separate processes making changes in arbitrary order based on the same custom resource specs.

What About Downtime?

What happens if the pod of the operator crashes? Kubernetes will restart it, and when we register the watch we receive the most recent events of all the resources again. In this way we also process events of changes that were done while the operator was not running.

Conclusion

Hope you now have a taste of the core problems the framework has to deal with. There are also more enhancements and features on the way. Please let us know if you have any questions or feedback.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store