Note: StreamNative now offers a unified approach to managing Pulsar clusters on Kubernetes systems, transitioning from two distinct versions of operators—Pulsar Operators (Basic Version) and StreamNative Operator (Advanced Version)—to a single, consolidated operator, StreamNative Operator, effective from the start of 2024. As part of this change, we will cease the release of new versions of Pulsar Operators, with future updates and enhancements being exclusively available through the StreamNative Operator, accessible only via StreamNative's paid services.
An operator is a controller that manages an application on Kubernetes. It helps the SRE team automate infrastructure changes, including deployment, updates, and scaling, as it provides full lifecycle management for the application. Starting from this blog, I will post a series of articles to talk about how to use StreamNative Pulsar Operators on Kubernetes to better manage applications. In the first blog, I will demonstrate how to use Pulsar Operators to deploy Pulsar on Kubernetes.
Before I introduce the specific installation steps, let’s take a look at the three sets of Operators provided by StreamNative.
- Pulsar Operators. Kubernetes controllers that provide a declarative API to simplify the deployment and management of Pulsar clusters on Kubernetes. Specifically, there are three Operators:
- ~ Pulsar Operator. Manages the deployment of the Pulsar broker and Pulsar proxy for the Pulsar cluster.
- ~ BookKeeper Operator. Provides full lifecycle management for the BookKeeper cluster.
- ~ ZooKeeper Operator. Provides full lifecycle management for the ZooKeeper cluster.
- Pulsar Resources Operator. An independent controller that automatically manages Pulsar resources (for example, tenants, namespace, topics, and permissions) on Kubernetes through manifest files.
- Function Mesh Operator. Integrates different functions to process data.
You can find the three sets of Operators in the streamnative and function-mesh Helm chart repositories respectively. If you haven’t added these two repositories, you need to use the helm repo add command to add them first before you search them for the operators.
# helm search repo streamnative
streamnative/pulsar-operator 0.11.5 0.11.5 Apache Pulsar Operators Helm chart for Kubernetes
streamnative/pulsar-resources-operator v0.0.8 v0.0.1 Pulsar Resources Operator Helm chart for Pulsar…
# helm search repo function-mesh
NAME CHART VERSION APP VERSION DESCRIPTION
function-mesh/function-mesh-operator 0.2.1 0.3.0 function mesh operator Helm chart for Kubernetes
For a quick start, you can follow the official installation documentation. This blog explores a step-by-step way to set up a Pulsar cluster by deploying its key components separately through the Pulsar Operators.
Fetch and check the Pulsar Operators Helm chart
1. Instead of using “helm install” to deploy the chart from the repository directly, I fetched the chart to check its details in this example.
helm repo add streamnative https://charts.streamnative.io
helm repo update
helm fetch streamnative/pulsar-operator --untar
cd pulsar-operator
2. The command helm fetch with the --untar option downloads the chart template to your local machine. Let’s check the chart file.
# cat Chart.yaml
apiVersion: v1
appVersion: 0.9.4
description: Apache Pulsar Operators Helm chart for Kubernetes
home: https://streamnative.io
icon: http://pulsar.apache.org/img/pulsar.svg
kubeVersion: '>= 1.16.0-0 < 1.24.0-0'
maintainers:
- email: support@streamnative.io
name: StreamNative Support
name: pulsar-operator
sources:
- https://github.com/streamnative/pulsar-operators
version: 0.10.0
3. The chart file describes the basic information of the Helm chart, such as the maintainer and app version. The current chart supports Kubernetes versions 1.16 to 1.23. As my existing Kubernetes version is 1.24.0, if I run “helm install” directly from the remote chart repository, the installation will stop at the Kubernetes version check.
# helm install sn-operator -n test streamnative/pulsar-operator
Error: INSTALLATION FAILED: chart requires kubeVersion: >= 1.16.0-0 < 1.24.0-0 which is incompatible with Kubernetes v1.24.0
To bypass this, modify the kubeVersion range to >=1.16.0–0 < 1.25.0–0. Note that the StreamNative team is working on an issue regarding pdb v1beta1 API removal in Kubernetes 1.25. Current operators won’t work in 1.25+.
4. A good chart maintainer documents all configurations in values.yaml. This values.yaml file is pretty straightforward, as it describes the operators you can install, including zookeeper-operator, bookkeeper-operator, and pulsar-operator (broker/proxy). The file also contains the image repository locations and tags, as well as operator details like cluster roles/roles, service accounts, and operator resource limits and requests. Additionally, if you want to pull the images from a private repository, simply change the image repository URL to your private repository. For more information about the role of values.yaml file in Helm, see the Helm documentation.
In this example, I kept the default values in values.yaml. I will come back to modify some configurations (for example, CRD roles and cluster role bindings) in a more restrictive environment.
Deploy the Pulsar Operators
1. After reviewing the values, use helm install to deploy the Pulsar Operators in the sn-operator namespace through the local chart. As I mentioned above, I fetched the chart locally to change the Kubernetes version and inspect the values.yaml file. If your Kubernetes version is compatible (1.16-1.23), you can simply use the helm install command directly as stated in the documentation.
# kubectl create namespace sn-operator
# helm install -n sn-operator pulsar-operator .
2. Check all resources in the sn-operator namespace. You should find the following components.
# kubectl get all -n sn-operator
NAME READY STATUS RESTARTS AGE
pod/pulsar-operator-bookkeeper-controller-manager-9c596465-h8nbh 1/1 Running 0 16h
pod/pulsar-operator-pulsar-controller-manager-6f8699ffc-gr989 1/1 Running 0 16h
pod/pulsar-operator-zookeeper-controller-manager-7b54b76c79-rsm6t 1/1 Running 0 16h
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/pulsar-operator-bookkeeper-controller-manager 1/1 1 1 16h
deployment.apps/pulsar-operator-pulsar-controller-manager 1/1 1 1 16h
deployment.apps/pulsar-operator-zookeeper-controller-manager 1/1 1 1 16h
NAME DESIRED CURRENT READY AGE
replicaset.apps/pulsar-operator-bookkeeper-controller-manager-9c596465 1 1 1 16h
replicaset.apps/pulsar-operator-pulsar-controller-manager-6f8699ffc 1 1 1 16h
replicaset.apps/pulsar-operator-zookeeper-controller-manager-7b54b76c79 1 1 1 16h
3. The related Kubernetes API resources have also been created.
# kubectl api-resources | grep pulsar
pulsarbrokers pb,broker pulsar.streamnative.io/v1alpha1 true PulsarBroker
pulsarconnections pconn pulsar.streamnative.io/v1alpha1 true PulsarConnection
pulsarnamespaces pns pulsar.streamnative.io/v1alpha1 true PulsarNamespace
pulsarpermissions ppermission pulsar.streamnative.io/v1alpha1 true PulsarPermission
pulsarproxies pp,proxy pulsar.streamnative.io/v1alpha1 true PulsarProxy
pulsartenants ptenant pulsar.streamnative.io/v1alpha1 true PulsarTenant
pulsartopics ptopic pulsar.streamnative.io/v1alpha1 true PulsarTopic
Deploy ZooKeeper, BookKeeper, PulsarBroker, and PulsarProxy
As shown in the previous section, there are seven controllers/operators, and each handles different “kinds” of custom resources, including Pulsar Operator CRDs (PulsarBroker, PulsarProxy, ZooKeeperCluster, and BookKeeperCluster) and Resources Operator CRDs (PulsarTenant, PulsarNamespace, PulsarTopic, PulsarPermission, and PulsarConnection).
Note that you need the Resource Operators to create topics, tenants, namespaces, and permissions. The CRDs created by the Helm chart contains both Pulsar cluster CRDs and resource CRDs.
Like the standard Kubernetes controllers and Deployments, we tell the controller what we want by feeding the cluster definitions in Custom Resource (CR). In a regular Kubernetes Deployment manifest, you put all kinds of components in a YAML file and use kubectl apply or kubectl create to create Pods, Services, ConfigMaps, and other resources. Similarly, you can put ZooKeeper, BookKeeper, and PulsarBroker cluster definitions in a single YAML file, and then deploy them in one shot.
In order to understand and troubleshoot the deployment, I will break it down into three parts - ZooKeeper, BookKeeper, and then the PulsarBroker. As this blog is focused on the installation of these Pulsar components on Kubernetes, I will not explain their concepts in detail. If you want to understand the dependencies among the three, check out Sijie Guo’s TGIP YouTube video or refer to the Pulsar documentation.
1. The following is the ZooKeeperCluster definition that the ZooKeeper controller/operator will use to deploy a ZooKeeper cluster. It is very similar to a Kubernetes Deployment. It defines the image location, version, replicas, resources, and persistent storage properties. There should be other properties like JVM flags for tuning. I will discuss this later as we focus on getting a running cluster, and the operator should help ensure the extra configurations are updated automatically.
---
apiVersion: zookeeper.streamnative.io/v1alpha1
kind: ZooKeeperCluster
metadata:
name: my
namespace: sn-platform
spec:
image: streamnative/pulsar:2.9.2.15
replicas: 3
pod:
resources:
requests:
cpu: "50m"
memory: "256Mi"
limits:
cpu: "50m"
memory: "256Mi"
persistence:
reclaimPolicy: Retain
data:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: "10Gi"
dataLog:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: "2Gi"
2. Apply this file and see what happens.
# kubectl apply -f zk-cluster.yaml
zookeepercluster.zookeeper.streamnative.io/my created
# kubectl get pod -n sn-platform -w
NAME READY STATUS RESTARTS AGE
my-zk-0 1/1 Running 0 25s
my-zk-1 1/1 Running 0 25s
my-zk-2 1/1 Running 0 25s
# kubectl get svc -n sn-platform
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
my-zk ClusterIP 10.104.64.179 <none> 2181/TCP,8000/TCP,9990/TCP 42s
my-zk-headless ClusterIP None <none> 2181/TCP,2888/TCP,3888/TCP,8000/TCP,9990/TCP 42s
3. Check the ZooKeeper controller log and see what’s going on.
# kubectl logs -n sn-ops pulsar-operator-zookeeper-controller-manager-7b54b76c79-rsm6t
{"severity":"info","timestamp":"2022-05-11T02:47:07Z","logger":"controllers.ZooKeeperCluster","message":"Reconciling ZooKeeperCluster","Request.Namespace":"sn-platform","Request.Name":"my"}
{"severity":"info","timestamp":"2022-05-11T02:47:07Z","logger":"controllers.ZooKeeperCluster","message":"Updating an existing ZooKeeper StatefulSet","StatefulSet.Namespace":"sn-platform","StatefulSet.Name":"my-zk"}
{"severity":"debug","timestamp":"2022-05-11T02:47:07Z","logger":"controller","message":"Successfully Reconciled","reconcilerGroup":"zookeeper.streamnative.io","reconcilerKind":"ZooKeeperCluster","controller":"zookeepercluster","name":"my","namespace":"sn-platform"}
{"severity":"info","timestamp":"2022-05-11T02:47:07Z","logger":"controllers.ZooKeeperCluster","message":"Reconciling ZooKeeperCluster","Request.Namespace":"sn-platform","Request.Name":"my"}
{"severity":"info","timestamp":"2022-05-11T02:47:07Z","logger":"controllers.ZooKeeperCluster","message":"Updating an existing ZooKeeper StatefulSet","StatefulSet.Namespace":"sn-platform","StatefulSet.Name":"my-zk"}
{"severity":"debug","timestamp":"2022-05-11T02:47:07Z","logger":"controller","message":"Successfully Reconciled","reconcilerGroup":"zookeeper.streamnative.io","reconcilerKind":"ZooKeeperCluster","controller":"zookeepercluster","name":"my","namespace":"sn-platform"}
{"severity":"info","timestamp":"2022-05-11T02:47:07Z","logger":"controllers.ZooKeeperCluster","message":"Reconciling ZooKeeperCluster","Request.Namespace":"sn-platform","Request.Name":"my"}
{"severity":"info","timestamp":"2022-05-11T02:47:07Z","logger":"controllers.ZooKeeperCluster","message":"Updating an existing ZooKeeper StatefulSet","StatefulSet.Namespace":"sn-platform","StatefulSet.Name":"my-zk"}
The ZooKeeper controller is watching and running a reconcile loop that keeps checking the my ZooKeeperCluster status in the sn-platform namespace. This is a recommended operator design pattern. The operator Pod log is handy to troubleshoot your Pulsar deployment.
4. The next component is the BookKeeper cluster. Following the same pattern, you can define the BookKeeperCluster kind like the following. Note that zkServers is required in this YAML file, and it should point to the headless Service (reaching all three zkServers) of the ZooKeeper cluster you just created.
---
apiVersion: bookkeeper.streamnative.io/v1alpha1
kind: BookKeeperCluster
metadata:
name: my
namespace: sn-platform
spec:
image: streamnative/pulsar:2.9.2.15
replicas: 3
pod:
resources:
requests:
cpu: "200m"
memory: "256Mi"
storage:
reclaimPolicy: Retain
journal:
numDirsPerVolume: 1
numVolumes: 1
volumeClaimTemplate:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: "8Gi"
ledger:
numDirsPerVolume: 1
numVolumes: 1
volumeClaimTemplate:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: "16Gi"
zkServers: my-zk-headless:2181
5. Apply the BookKeeper manifest file.
# kubectl apply -f bk-cluster.yaml
bookkeepercluster.bookkeeper.streamnative.io/my created
# kubectl get pod -n sn-platform
NAME READY STATUS RESTARTS AGE
my-bk-0 1/1 Running 0 90s
my-bk-1 1/1 Running 0 90s
my-bk-2 1/1 Running 0 90s
my-bk-auto-recovery-0 1/1 Running 0 48s
my-zk-0 1/1 Running 0 4m51s
my-zk-1 1/1 Running 0 4m51s
my-zk-2 1/1 Running 0 4m51s
6. You can use the same command to find out what the BookKeeper operator is doing behind the scenes.
# kubectl logs -n sn-ops pulsar-operator-bookkeeper-controller-manager-9c596465-h8nbh
W0512 11:45:15.235940 1 warnings.go:67] policy/v1beta1 PodDisruptionBudget is deprecated in v1.21+, unavailable in v1.25+; use policy/v1 PodDisruptionBudget
{"severity":"info","timestamp":"2022-05-12T11:47:27Z","logger":"controllers.BookKeeperCluster","message":"Reconciling BookKeeperCluster","Request.Namespace":"sn-platform","Request.Name":"my"}
{"severity":"info","timestamp":"2022-05-12T11:47:27Z","logger":"controllers.BookKeeperCluster","message":"Updating the status for the BookKeeperCluster","Namespace":"sn-platform","Name":"my","Status":{"observedGeneration":4,"replicas":4,"readyReplicas":4,"updatedReplicas":4,"labelSelector":"cloud.streamnative.io/app=pulsar,cloud.streamnative.io/cluster=my,cloud.streamnative.io/component=bookie","conditions":[{"type":"AutoRecovery","status":"True","reason":"Deploy","message":"Ready","lastTransitionTime":"2022-05-08T19:58:26Z"},{"type":"Bookie","status":"True","reason":"Ready","message":"Bookies are ready","lastTransitionTime":"2022-05-09T00:34:12Z"},{"type":"Initialization","status":"True","reason":"Initialization","message":"Initialization succeeded","lastTransitionTime":"2022-05-08T19:57:43Z"},{"type":"Ready","status":"True","reason":"Ready","lastTransitionTime":"2022-05-09T00:34:12Z"}]}}
{"severity":"debug","timestamp":"2022-05-12T11:47:27Z","logger":"controller","message":"Successfully Reconciled","reconcilerGroup":"bookkeeper.streamnative.io","reconcilerKind":"BookKeeperCluster","controller":"bookkeepercluster","name":"my","namespace":"sn-platform"}
W0512 11:49:18.419391 1 warnings.go:67] autoscaling/v2beta2 HorizontalPodAutoscaler is deprecated in v1.23+, unavailable in v1.26+; use autoscaling/v2 HorizontalPodAutoscaler
W0512 11:52:35.237812 1 warnings.go:67] policy/v1beta1 PodDisruptionBudget is deprecated in v1.21+, unavailable in v1.25+; use policy/v1 PodDisruptionBudget
W0512 11:57:06.421507 1 warnings.go:67] autoscaling/v2beta2 HorizontalPodAutoscaler is deprecated in v1.23+, unavailable in v1.26+; use autoscaling/v2 HorizontalPodAutoscaler
W0512 11:58:43.240049 1 warnings.go:67] policy/v1beta1 PodDisruptionBudget is deprecated in v1.21+, unavailable in v1.25+; use policy/v1 PodDisruptionBudget
W0512 12:04:06.423448 1 warnings.go:67] autoscaling/v2beta2 HorizontalPodAutoscaler is deprecated in v1.23+, unavailable in v1.26+; use autoscaling/v2 HorizontalPodAutoscaler
W0512 12:05:03.242609 1 warnings.go:67] policy/v1beta1 PodDisruptionBudget is deprecated in v1.21+, unavailable in v1.25+; use policy/v1 PodDisruptionBudget
W0512 12:09:08.425304 1 warnings.go:67] autoscaling/v2beta2 HorizontalPodAutoscaler is deprecated in v1.23+, unavailable in v1.26+; use autoscaling/v2 HorizontalPodAutoscaler
W0512 12:14:15.245078 1 warnings.go:67] policy/v1beta1 PodDisruptionBudget is deprecated in v1.21+, unavailable in v1.25+; use policy/v1 PodDisruptionBudget
W0512 12:18:47.427470 1 warnings.go:67] autoscaling/v2beta2 HorizontalPodAutoscaler is deprecated in v1.23+, unavailable in v1.26+; use autoscaling/v2 HorizontalPodAutoscaler
W0512 12:19:30.247840 1 warnings.go:67] policy/v1beta1 PodDisruptionBudget is deprecated in v1.21+, unavailable in v1.25+; use policy/v1 PodDisruptionBudget
W0512 12:25:04.249159 1 warnings.go:67] policy/v1beta1 PodDisruptionBudget is deprecated in v1.21+, unavailable in v1.25+; use policy/v1 PodDisruptionBudget
W0512 12:25:13.430394 1 warnings.go:67] autoscaling/v2beta2 HorizontalPodAutoscaler is deprecated in v1.23+, unavailable in v1.26+; use autoscaling/v2 HorizontalPodAutoscaler
There are a bunch of warnings complaining about API deprecations. This is normal because in this example my Kubernetes version is 1.24.0, which has many big changes.
7. The next component is the broker cluster. The following is the PulsarBroker YAML file, where you can see the broker also depends on zkServer. Note that I added config.custom in the descriptor. This will turn on broker’s WebSocket endpoint.
---
apiVersion: pulsar.streamnative.io/v1alpha1
kind: PulsarBroker
metadata:
name: my
namespace: sn-platform
spec:
image: streamnative/pulsar:2.9.2.15
pod:
resources:
requests:
cpu: 200m
memory: 256Mi
terminationGracePeriodSeconds: 30
config:
custom:
webSocketServiceEnabled: "true"
replicas: 2
zkServers: my-zk-headless:2181
8. Like ZooKeeper and BookKeeper clusters, creating a broker cluster is the same as creating standard Kubernetes objects. You can use kubectl get pod -n <namespace> -w to watch the sequence of Pod creation. This is helpful to understand the dependencies among Pulsar components. I skipped the operator log in this blog. You can also run a similar command to trace the controller log.
# kubectl apply -f br-cluster.yaml
pulsarbroker.pulsar.streamnative.io/my created
# kubectl get pod -n sn-platform -w
NAME READY STATUS RESTARTS AGE
my-bk-0 1/1 Running 0 2m11s
my-bk-1 1/1 Running 0 2m11s
my-bk-2 1/1 Running 0 2m11s
my-bk-auto-recovery-0 1/1 Running 0 89s
my-broker-metadata-init-gghqc 0/1 Completed 0 6s
my-zk-0 1/1 Running 0 5m32s
my-zk-1 1/1 Running 0 5m32s
my-zk-2 1/1 Running 0 5m32s
my-broker-metadata-init-gghqc 0/1 Completed 0 7s
my-broker-metadata-init-gghqc 0/1 Completed 0 7s
my-broker-0 0/1 Pending 0 0s
my-broker-1 0/1 Pending 0 0s
my-broker-0 0/1 Pending 0 0s
my-broker-1 0/1 Pending 0 0s
my-broker-1 0/1 Init:0/1 0 0s
my-broker-0 0/1 Init:0/1 0 0s
my-broker-metadata-init-gghqc 0/1 Terminating 0 8s
my-broker-metadata-init-gghqc 0/1 Terminating 0 8s
my-broker-0 0/1 Init:0/1 0 0s
my-broker-1 0/1 Init:0/1 0 0s
my-broker-1 0/1 PodInitializing 0 1s
my-broker-0 0/1 PodInitializing 0 1s
my-broker-1 0/1 Running 0 2s
my-broker-0 0/1 Running 0 2s
my-broker-0 0/1 Running 0 10s
my-broker-1 0/1 Running 0 10s
my-broker-0 1/1 Running 0 40s
my-broker-1 1/1 Running 0 40s
9. When all Pods are up and running, check the Services and you can see that all their types are ClusterIP. This assumes that all producer and consumer workloads are inside the Kubernetes cluster. In order to test the traffic, I need a LoadBalancer from machines in my environment but external to the Kubernetes cluster.
10. Now let’s deploy the proxy. The proxy is a bit tricky because TLS is enabled by default, which makes sense as it is the external gateway to connect to the Pulsar cluster. In this example, I turned off TLS on all components for simplicity. I will discuss enabling TLS using the operator in the next blog.
---
apiVersion: pulsar.streamnative.io/v1alpha1
kind: PulsarProxy
metadata:
name: my
namespace: sn-platform
spec:
brokerAddress: my-broker-headless
dnsNames: []
#webSocketServiceEnabled: true
image: streamnative/pulsar:2.9.2.15
config:
tls:
enabled: false
issuerRef:
name: ""
pod:
resources:
requests:
cpu: 200m
memory: 256Mi
replicas: 1
11. Apply the proxy manifest file and check the status of different resources.
# kubectl apply -f px-cluster.yaml
pulsarproxy.pulsar.streamnative.io/my created
# kubectl get pod -n sn-platform
NAME READY STATUS RESTARTS AGE
my-bk-0 1/1 Running 0 44m
my-bk-1 1/1 Running 0 44m
my-bk-2 1/1 Running 0 44m
my-bk-auto-recovery-0 1/1 Running 0 43m
my-broker-0 1/1 Running 0 42m
my-broker-1 1/1 Running 0 42m
my-proxy-0 1/1 Running 0 57s
my-zk-0 1/1 Running 0 47m
my-zk-1 1/1 Running 0 47m
my-zk-2 1/1 Running 0 47m
# kubectl get svc -n sn-platform
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
my-bk ClusterIP 10.104.212.43 <none> 3181/TCP,8000/TCP 44m
my-bk-auto-recovery-headless ClusterIP None <none> 3181/TCP,8000/TCP 44m
my-bk-headless ClusterIP None <none> 3181/TCP,8000/TCP 44m
my-broker ClusterIP 10.99.107.224 <none> 6650/TCP,8080/TCP 41m
my-broker-headless ClusterIP None <none> 6650/TCP,8080/TCP 41m
my-proxy-external LoadBalancer 10.109.250.31 10.0.0.36 6650:32751/TCP,8080:30322/TCP 33s
my-proxy-headless ClusterIP None <none> 6650/TCP,8080/TCP 33s
my-zk ClusterIP 10.104.64.179 <none> 2181/TCP,8000/TCP,9990/TCP 47m
my-zk-headless ClusterIP None <none> 2181/TCP,2888/TCP,3888/TCP,8000/TCP,9990/TCP 47m
As shown above, the proxy automatically obtained my external LoadBalancer IP (10.0.0.36). In this example, I used MetalLB to expose the proxy Service.
Summary
By now, you should have a running Pulsar cluster and an exposed Service endpoint that you can use to start producing and consuming messages. In the next blog, I will demonstrate how to write consumer and producer container images to interact with the Pulsar cluster.
More on Apache Pulsar
Pulsar has become one of the most active Apache projects over the past few years, with a vibrant community driving innovation and improvements to the project. Check out the following resources to learn more about Pulsar.