An operator is a controller that manages an application on Kubernetes. It helps the SRE team automate infrastructure changes, including deployment, updates, and scaling, as it provides full lifecycle management for the application. Starting from this blog, I will post a series of articles to talk about how to use StreamNative Pulsar Operators on Kubernetes to better manage applications. In the first blog, I will demonstrate how to use Pulsar Operators to deploy Pulsar on Kubernetes.
Before I introduce the specific installation steps, let’s take a look at the three sets of Operators provided by StreamNative.
- Pulsar Operators. Kubernetes controllers that provide a declarative API to simplify the deployment and management of Pulsar clusters on Kubernetes. Specifically, there are three Operators:
- ~ Pulsar Operator. Manages the deployment of the Pulsar broker and Pulsar proxy for the Pulsar cluster.
- ~ BookKeeper Operator. Provides full lifecycle management for the BookKeeper cluster.
- ~ ZooKeeper Operator. Provides full lifecycle management for the ZooKeeper cluster.
- Pulsar Resources Operator. An independent controller that automatically manages Pulsar resources (for example, tenants, namespace, topics, and permissions) on Kubernetes through manifest files.
- Function Mesh Operator. Integrates different functions to process data.
You can find the three sets of Operators in the streamnative and function-mesh Helm chart repositories respectively. If you haven’t added these two repositories, you need to use the helm repo add command to add them first before you search them for the operators.
For a quick start, you can follow the official installation documentation. This blog explores a step-by-step way to set up a Pulsar cluster by deploying its key components separately through the Pulsar Operators.
Fetch and check the Pulsar Operators Helm chart
1. Instead of using “helm install” to deploy the chart from the repository directly, I fetched the chart to check its details in this example.
2. The command helm fetch with the --untar option downloads the chart template to your local machine. Let’s check the chart file.
3. The chart file describes the basic information of the Helm chart, such as the maintainer and app version. The current chart supports Kubernetes versions 1.16 to 1.23. As my existing Kubernetes version is 1.24.0, if I run “helm install” directly from the remote chart repository, the installation will stop at the Kubernetes version check.
To bypass this, modify the kubeVersion range to >=1.16.0–0 < 1.25.0–0. Note that the StreamNative team is working on an issue regarding pdb v1beta1 API removal in Kubernetes 1.25. Current operators won’t work in 1.25+.
4. A good chart maintainer documents all configurations in values.yaml. This values.yaml file is pretty straightforward, as it describes the operators you can install, including zookeeper-operator, bookkeeper-operator, and pulsar-operator (broker/proxy). The file also contains the image repository locations and tags, as well as operator details like cluster roles/roles, service accounts, and operator resource limits and requests. Additionally, if you want to pull the images from a private repository, simply change the image repository URL to your private repository. For more information about the role of values.yaml file in Helm, see the Helm documentation.
In this example, I kept the default values in values.yaml. I will come back to modify some configurations (for example, CRD roles and cluster role bindings) in a more restrictive environment.
Deploy the Pulsar Operators
1. After reviewing the values, use helm install to deploy the Pulsar Operators in the sn-operator namespace through the local chart. As I mentioned above, I fetched the chart locally to change the Kubernetes version and inspect the values.yaml file. If your Kubernetes version is compatible (1.16-1.23), you can simply use the helm install command directly as stated in the documentation.
2. Check all resources in the sn-operator namespace. You should find the following components.
3. The related Kubernetes API resources have also been created.
Deploy ZooKeeper, BookKeeper, PulsarBroker, and PulsarProxy
As shown in the previous section, there are seven controllers/operators, and each handles different “kinds” of custom resources, including Pulsar Operator CRDs (PulsarBroker, PulsarProxy, ZooKeeperCluster, and BookKeeperCluster) and Resources Operator CRDs (PulsarTenant, PulsarNamespace, PulsarTopic, PulsarPermission, and PulsarConnection).
Note that you need the Resource Operators to create topics, tenants, namespaces, and permissions. The CRDs created by the Helm chart contains both Pulsar cluster CRDs and resource CRDs.
Like the standard Kubernetes controllers and Deployments, we tell the controller what we want by feeding the cluster definitions in Custom Resource (CR). In a regular Kubernetes Deployment manifest, you put all kinds of components in a YAML file and use kubectl apply or kubectl create to create Pods, Services, ConfigMaps, and other resources. Similarly, you can put ZooKeeper, BookKeeper, and PulsarBroker cluster definitions in a single YAML file, and then deploy them in one shot.
In order to understand and troubleshoot the deployment, I will break it down into three parts - ZooKeeper, BookKeeper, and then the PulsarBroker. As this blog is focused on the installation of these Pulsar components on Kubernetes, I will not explain their concepts in detail. If you want to understand the dependencies among the three, check out Sijie Guo’s TGIP YouTube video or refer to the Pulsar documentation.
1. The following is the ZooKeeperCluster definition that the ZooKeeper controller/operator will use to deploy a ZooKeeper cluster. It is very similar to a Kubernetes Deployment. It defines the image location, version, replicas, resources, and persistent storage properties. There should be other properties like JVM flags for tuning. I will discuss this later as we focus on getting a running cluster, and the operator should help ensure the extra configurations are updated automatically.
2. Apply this file and see what happens.
3. Check the ZooKeeper controller log and see what’s going on.
The ZooKeeper controller is watching and running a reconcile loop that keeps checking the my ZooKeeperCluster status in the sn-platform namespace. This is a recommended operator design pattern. The operator Pod log is handy to troubleshoot your Pulsar deployment.
4. The next component is the BookKeeper cluster. Following the same pattern, you can define the BookKeeperCluster kind like the following. Note that zkServers is required in this YAML file, and it should point to the headless Service (reaching all three zkServers) of the ZooKeeper cluster you just created.
5. Apply the BookKeeper manifest file.
6. You can use the same command to find out what the BookKeeper operator is doing behind the scenes.
There are a bunch of warnings complaining about API deprecations. This is normal because in this example my Kubernetes version is 1.24.0, which has many big changes.
7. The next component is the broker cluster. The following is the PulsarBroker YAML file, where you can see the broker also depends on zkServer. Note that I added config.custom in the descriptor. This will turn on broker’s WebSocket endpoint.
8. Like ZooKeeper and BookKeeper clusters, creating a broker cluster is the same as creating standard Kubernetes objects. You can use kubectl get pod -n <namespace> -w to watch the sequence of Pod creation. This is helpful to understand the dependencies among Pulsar components. I skipped the operator log in this blog. You can also run a similar command to trace the controller log.
9. When all Pods are up and running, check the Services and you can see that all their types are ClusterIP. This assumes that all producer and consumer workloads are inside the Kubernetes cluster. In order to test the traffic, I need a LoadBalancer from machines in my environment but external to the Kubernetes cluster.
10. Now let’s deploy the proxy. The proxy is a bit tricky because TLS is enabled by default, which makes sense as it is the external gateway to connect to the Pulsar cluster. In this example, I turned off TLS on all components for simplicity. I will discuss enabling TLS using the operator in the next blog.
11. Apply the proxy manifest file and check the status of different resources.
As shown above, the proxy automatically obtained my external LoadBalancer IP (10.0.0.36). In this example, I used MetalLB to expose the proxy Service.
By now, you should have a running Pulsar cluster and an exposed Service endpoint that you can use to start producing and consuming messages. In the next blog, I will demonstrate how to write consumer and producer container images to interact with the Pulsar cluster.
More on Apache Pulsar
Pulsar has become one of the most active Apache projects over the past few years, with a vibrant community driving innovation and improvements to the project. Check out the following resources to learn more about Pulsar.