Exploring Serverless Kubernetes Operators for JOSDK
In the past we had to implement Kubernetes operators which had some interesting characteristics. We wanted to manage non-Kubernetes resources as part of our custom GitOps solution. These resources included git repositories and related user management, also Jenkins pipelines. Since this was a solution on top of OpenShift, we decided to manage those resources with operators. We had the corresponding representations in the form of custom resources, and implemented related operators with an early version of Java Operator SDK (JOSDK). What was quite common, that after we onboard a team on our solution, what basically meant creating the mentioned custom resources, those resources rarely changed. As a result our operators were most of the time idle, not receiving any events.
Operators are used for managing resources inside and outside of the cluster; it can typically happen that there are no events to process related to a custom resource for long periods of time . However an operator meanwhile will consume resources of the cluster. Note that, in an operator we tend to watch and cache states of dependent resources (usually using Informers), this can lead to increased memory consumption. There is also a tendency to deploy operators highly available, in that case multiple instances of the same operator will be running, consuming even more resources.
With Knative becoming more and more mature, naturally comes the idea would it make sense to implement and deploy — at least some types of — operators in a serverless way, and what are the problem we are facing or how hard is to achieve this.
We will explore this topic and related challenges and propose solutions to solve them. Focusing also on how support for such serverless operators would look like. Note that we mostly will focus on the use case when the operator needs to be downscaled to zero. However we will briefly touch on the other side of the story, thus scaling up the number of instances for the case there is a high load, as it turns out except for some (not trivial) problems the solution would be very similar to support that.
The assumption is that the reader is familiar with the basics of operators and Knative.
The Big Picture in a Nutshell
Deploying a serverless operator is quite straightforward, it is deployed as a Knative Service, where Knative serving handles all the aspects of the pod scaling. The events regarding the custom resource are received as HTTP POST requests (CloudEvent as payload) from ApiServerSource, which triggers the reconciliation. If there are no subsequent requests the operator pod instances are scaled down to zero after some time. To listen to events for related Kuberentes resources we can add additional ApiServerSources. External events can be forwarded to the service by Knative broker.
In this section we will revisit the requirements for an operator framework that we discussed already for JOSDK in one of our previous posts, and will take a look at how we could implement the requirements in a serverless way.
In general, one of the most complex topics of implementing a serverless operator would be concurrency. The challenging requirement here is to process multiple parallel events related to a particular custom resource. In other words reconciliation is already happening and a new event is received, the system should wait until the previous processing is finished, then reconcile again.
If the operator is not serverless, this is handled within the boundaries of a single process. Since currently all the operator frameworks run only one primary operator instance that handles the events within the boundaries of a single process. The received event is stored in memory, and the controller waits until the actual reconciliation is finished, after that it just reconciles again.
Luckily Knative allows us to set an upper bound to our scaling. So we can make sure there is at most one operator running. Based on this we can differentiate these two use cases:
- Running at most one instance of an operator. From the concurrency perspective this is a trivial case. The implementation regarding the concurrency is equivalent to the current implementation.
- Running possibly multiple instances of an operator. The main problem with this approach is that it’s harder to make sure that there are no concurrent reconciliations running, since we cannot use in memory locks, as we do with one instance.
There are multiple solutions for this problem, one is just to use a pessimistic lock, thus to mark the custom resource as locked while the reconciliation is running, and start the reconciliation of the next event if the lock is removed or expired.
Another way to solve this is to make sure all the events of a particular custom resource are processed by the same instance. We won’t discover the technical details of such coordination now, just we have to bear in mind that this problem exists.
While case 1. solves the problem of scaling down the operator instances from one to zero and spare some resources, with case 2. solves the problem of scaling up in case of a high load. Those are two fundamentally different issues.
Caching is another fundamental property where serverless operators might be different. If we assume that the use case is that the operator is just started when there is an event and in most cases it will be scaled back to 0 instances after the reconciliation, in memory caches does not make sense anymore. So serverless operators will refresh the state of the dependent resource by calling K8S API server or other relevant APIs. This does not cause any functional issues, however might lead to a less efficient communication with the related APIs.
On the other hand there won’t be a persistent connection to K8S API, to watch the resource. (Although, in the current implementation of ApiServerSource a pod is created that maintains this watch anyways.)
In case of scaling up, but also in some other middle ground situations (when we expect to have an operator running for longer periods of time) the caching part would still be very similar as in non-serverless operators. So it definitely makes sense to cache states of dependent resources. One key difference is that, we still want to have the triggers of reconciliation from ApiServerSources not from informers, for the case when the operator is scaled down to zero.
Generation Filter Support
Usually operators want just to reconcile on a custom resource event when the
.metadata.generation attribute has increased since the last reconciliation. Thus, the custom resource specs are changed. It is possible to compare the new version of a custom resource with a version in the cache and see if the generation increased. ( Although this is not an ideal solution anyways, in case an operator restarts the cached version is lost, so the operator will have to reconcile all the resources again). A serverless operator that does not rely on caching, is even more important to rely on the pattern where the last reconciled generation is persisted in status of the custom resource. Usually this field is
.status.observedGeneration. This common pattern works nicely also for serverless operators.
Error Handling and Retries
Knative supports automatic retry for event delivery. Thus in case the HTTP POST during an event delivery returns with an HTTP error, the delivery will be retried automatically. (Note that we need a broker or channel for this functionality)
It’s nice that it is supported out of the box, on the other hand the operator will have to return a response just after the reconciliation is done. What can take long in some cases. In other words until now the operator received an event from the watch, this was inherently asynchronous event processing, the handling of a Knative event in this case will return synchronously.
An interesting problem with this is how to cancel a retry in case during the back-off period another event source triggered a reconciliation which was successful. Currently this happens in JOSDK, although it seems not trivial to support here, although this is just an optimization issue.
An alternative approach would be to store the received event in a Kubernetes resource (think a ConfigMap or a dedicated CustomResource), and process it in asynchronous way. So after the resources are stored, a HTTP response is instantly returned for the Knative event. Subsequently the stored event is processed. We won’t go into details regarding this approach would give the framework more space to operate regarding retries, however would also bring more complexity.
Some operators do periodic reconciliation. Fortunately Knative supports PingSource, which would facilitate this process, and would make it easy to implement.
Rescheduling Reconciliation with a Delay
A more common way to initiate an eventual reconciliation after some time period, thus to use “RequeueAfter” or “RescheduleAfter” (one is from controller-runtime, the other from JOSDK).
Such event source is not available currently in Knative, however it would not be complex to implement one, to cover this functionality transparently using the same API in the code.
In addition to that, PingSource could cover most of the use cases, i.e. when we just want to periodically reconcile and the reschedule delay is static.
ApiServerSource supports filtering events based on labels. Label selectors is what we support currently.
Additional complex and/or custom filters can be supported on code level, in case of serverless this is more heavy, since we need to start the pod even if the reconciliation would not happen in case such an event is filtered out before the propagation.
For Kubernetes events we can use ApiServerSource, to react to events on changes of Kubernetes objects of any kind.What about non-Kubernetes resources?
Is Knative a Requirement?
Well, not necessarily, it just makes most of the aspects much easier to implement. To have an idea how a scaling would look like without serverless see this blog post. We can say that implementing all the mentioned aspects without Knative, like handling external events, periodic reconciliation etc. However that would mean a significant effort to do so.
Supporting serverless Operators with Knative with a constraint that we run at most one instance would be relatively simple to implement. As it seems we could quite easily support almost all the features out of the box, just using Knative serving and eventing capabilities. While the operator is scaled down from one to zero it won’t consume any resources from the cluster. However, we will have to utilise at least ApiServerSources, which consumes already resources from the cluster. The question is, is it worth it? Or would it be just enough to optimise operators itself, like evicting caches after some time if there are no new events related to a custom resource? A nice reading on this topic this blog post, which gives an insight to this aspect.
On the other hand, scaling up the operators to more than one instance would bring additional complexity with some aspects of concurrency, but apart from that all the other aspects are mostly identical as in the first case.