Jan 12, 2022
16 min read

Pulsar Isolation Part III: Separate Pulsar Clusters Sharing a Single BookKeeper Cluster

Ran Gao
Software Engineer, StreamNative

This is the third blog in our 4-part blog series on achieving resource isolation in Apache Pulsar. The first blog gave an overview of the three approaches to implement isolation in Pulsar:

  1. Separate Pulsar clusters that use separate BookKeeper clusters: This shared-nothing approach offers the highest level of isolation and is suitable for storing highly sensitive data, such as personally identifiable information or financial records. Our second blog in this series provides a step-by-step tutorial for this approach.
  2. Separate Pulsar clusters that share one BookKeeper cluster: This approach utilizes separate Pulsar broker clusters in order to isolate the end-users from one another and allows you to use different authentication methods based on the use case. However, you gain the benefits of using a shared storage layer, such as a reduced hardware footprint and the associated hardware and maintenance costs.
  3. A single Pulsar cluster and a single BookKeeper cluster: This is the more traditional approach that takes advantage of Pulsar’s built-in multi-tenancy features.

In this blog, we show you how to implement the single, shared BookKeeper approach with an example. We will deploy two Pulsar clusters that share one BookKeeper cluster following the steps below:

  1. Deploy two Pulsar clusters that share one BookKeeper cluster
  2. Verify data isolation between the Pulsar clusters
  3. Scale up and down bookies

Set up the Shared BookKeeper Cluster

First, we set up the shared BookKeeper cluster on a computer that has an 8-core CPU and 16GB memory. Figure 1 and 2 show you the BookKeeper cluster.

All metadata services (ZooKeeper services) are single nodes. We don’t discuss this in detail in this blog.
Figure 1: Each cluster has its own brokers and local metadata store and shares the BookKeeper and Configuration Store.

Figure 2: Inside the shared BookKeeper cluster, each cluster will have its own affinity group of bookies. These bookie groups ensure that each cluster’s respective data remains isolated from one another.
Figure 2: Inside the shared BookKeeper cluster, each cluster will have its own affinity group of bookies. These bookie groups ensure that each cluster’s respective data remains isolated from one another.

Deploy Clusters

1. Download the latest binary Pulsar package. Currently, this would be the 2.8.1 package.

https://www.apache.org/dyn/mirrors/mirrors.cgi?action=download&filename=pulsar/pulsar-2.8.1/apache-pulsar-2.8.1-bin.tar.gz

2. Unzip the binary compression package.

tar -zxvf apache-pulsar-2.8.1-bin.tar.gz

3. Prepare the following cluster directories. Change the configuration of each directory as instructed in the table below.

Use the current directory as PULSAR_HOME and create the following topology of directories.

cp -r apache-pulsar-2.8.1 configuration-store2
mkdir -p bk-cluster
cp -r apache-pulsar-2.8.1 bk-cluster/bk1
cp -r apache-pulsar-2.8.1 bk-cluster/bk2
cp -r apache-pulsar-2.8.1 bk-cluster/bk3
cp -r apache-pulsar-2.8.1 bk-cluster/bk4
mkdir -p cluster1
cp -r apache-pulsar-2.8.1 cluster1/zk1
cp -r apache-pulsar-2.8.1 cluster1/broker1
mkdir -p cluster2
cp -r apache-pulsar-2.8.1 cluster2/zk1
cp -r apache-pulsar-2.8.1 cluster2/broker1

The directories’ topology is outlined below.

  • PULSAR_HOME
  • ~configuration-store
  • ~bk-cluster
  • ~~bk1
  • ~~bk2
  • ~~bk3
  • ~~bk4
  • ~~bk5
  • ~cluster1
  • ~~zk1
  • ~~broker1
  • ~cluster2
  • ~~zk1
  • ~~broker1
table

4. Start and initialize the configuration store and the metadata store.

$PULSAR_HOME/configuration-store/bin/pulsar-daemon start configuration-store
$PULSAR_HOME/cluster1/zk1/bin/pulsar-daemon start zookeeper
$PULSAR_HOME/cluster2/zk1/bin/pulsar-daemon start zookeeper

$PULSAR_HOME/configuration-store/bin/pulsar initialize-cluster-metadata \
--cluster cluster1 \
--zookeeper localhost:2182 \
--configuration-store localhost:2181 \
--web-service-url http://localhost:8080/ \
--broker-service-url pulsar://localhost:6650/

./configuration-store/bin/pulsar initialize-cluster-metadata \
--cluster cluster2 \
--zookeeper localhost:2183 \
--configuration-store localhost:2181 \
--web-service-url http://localhost:8081/ \
--broker-service-url pulsar://localhost:6651/

5. Initialize the BookKeeper metadata and start the bookie cluster.

$PULSAR_HOME/bk-cluster/bk1/bin/bookkeeper shell metaformat

$PULSAR_HOME/bk-cluster/bk1/bin/pulsar-daemon start bookie
$PULSAR_HOME/bk-cluster/bk2/bin/pulsar-daemon start bookie
$PULSAR_HOME/bk-cluster/bk3/bin/pulsar-daemon start bookie
$PULSAR_HOME/bk-cluster/bk4/bin/pulsar-daemon start bookie

6. Start brokers in cluster1 and cluster2.

$PULSAR_HOME/cluster1/broker1/bin/pulsar-daemon start broker
$PULSAR_HOME/cluster2/broker1/bin/pulsar-daemon start broker

7. Check brokers.

$PULSAR_HOME/cluster1/broker1/bin/pulsar-admin --admin-url http://localhost:8080 brokers list cluster1
"localhost:8080"
$PULSAR_HOME/cluster1/broker1/bin/pulsar-admin --admin-url http://localhost:8081 brokers list cluster2
"localhost:8081"

8. Check the bookie list for cluster1 and cluster2. As shown below, they share the bookie cluster.

$PULSAR_HOME/cluster1/broker1/bin/pulsar-admin --admin-url http://localhost:8080 bookies list-bookies
{
  "bookies" : [ {
    "bookieId" : "127.0.0.1:3181"
  }, {
    "bookieId" : "127.0.0.1:3182"
  }, {
    "bookieId" : "127.0.0.1:3183"
  }, {
    "bookieId" : "127.0.0.1:3184"
  } ]
}
$PULSAR_HOME/cluster1/broker1/bin/pulsar-admin --admin-url http://localhost:8081 bookies list-bookies
{
  "bookies" : [ {
    "bookieId" : "127.0.0.1:3181"
  }, {
    "bookieId" : "127.0.0.1:3182"
  }, {
    "bookieId" : "127.0.0.1:3183"
  }, {
    "bookieId" : "127.0.0.1:3184"
  } ]
}

Bookie Rack Placement

In order to archive resource isolation, we need to split the 4 bookie nodes into 2 resource groups.

1. Set the bookie rack for cluster1.

$PULSAR_HOME/cluster1/broker1/bin/pulsar-admin --admin-url http://localhost:8080 bookies set-bookie-rack \
--bookie 127.0.0.1:3181 \
--hostname 127.0.0.1:3181 \
--group group-bookie1 \
--rack rack1

$PULSAR_HOME/cluster1/broker1/bin/pulsar-admin --admin-url http://localhost:8080 bookies set-bookie-rack \
--bookie 127.0.0.1:3182 \
--hostname 127.0.0.1:3182 \
--group group-bookie1 \
--rack rack1


$PULSAR_HOME/cluster1/broker1/bin/pulsar-admin --admin-url http://localhost:8080 bookies set-bookie-rack \
--bookie 127.0.0.1:3183 \
--hostname 127.0.0.1:3183 \
--group group-bookie2 \
--rack rack2


$PULSAR_HOME/cluster1/broker1/bin/pulsar-admin --admin-url http://localhost:8080 bookies set-bookie-rack \
--bookie 127.0.0.1:3184 \
--hostname 127.0.0.1:3184 \
--group group-bookie2 \
--rack rack2

2. Check bookie racks placement in cluster1.

$PULSAR_HOME/cluster1/broker1/bin/pulsar-admin --admin-url http://localhost:8080 bookies racks-placement
"group-bookie1    {127.0.0.1:3181=BookieInfoImpl(rack=rack1, hostname=127.0.0.1:3181), 127.0.0.1:3182=BookieInfoImpl(rack=rack1, hostname=127.0.0.1:3182)}"
"group-bookie2    {127.0.0.1:3183=BookieInfoImpl(rack=rack2, hostname=127.0.0.1:3183), 127.0.0.1:3184=BookieInfoImpl(rack=rack2, hostname=127.0.0.1:3184)}"

3. Set bookie racks for cluster2.

$PULSAR_HOME/cluster1/broker1/bin/pulsar-admin --admin-url http://localhost:8081 bookies set-bookie-rack \
--bookie 127.0.0.1:3181 \
--hostname 127.0.0.1:3181 \
--group group-bookie1 \
--rack rack1

$PULSAR_HOME/cluster1/broker1/bin/pulsar-admin --admin-url http://localhost:8081 bookies set-bookie-rack \
--bookie 127.0.0.1:3182 \
--hostname 127.0.0.1:3182 \
--group group-bookie1 \
--rack rack1


$PULSAR_HOME/cluster1/broker1/bin/pulsar-admin --admin-url http://localhost:8081 bookies set-bookie-rack \
--bookie 127.0.0.1:3183 \
--hostname 127.0.0.1:3183 \
--group group-bookie2 \
--rack rack2


$PULSAR_HOME/cluster1/broker1/bin/pulsar-admin --admin-url http://localhost:8081 bookies set-bookie-rack \
--bookie 127.0.0.1:3184 \
--hostname 127.0.0.1:3184 \
--group group-bookie2 \
--rack rack2

4. Check bookie racks placement in cluster2.

$PULSAR_HOME/cluster1/broker1/bin/pulsar-admin --admin-url http://localhost:8081 bookies racks-placement
"group-bookie1    {127.0.0.1:3181=BookieInfoImpl(rack=rack1, hostname=127.0.0.1:3181), 127.0.0.1:3182=BookieInfoImpl(rack=rack1, hostname=127.0.0.1:3182)}"
"group-bookie2    {127.0.0.1:3183=BookieInfoImpl(rack=rack2, hostname=127.0.0.1:3183), 127.0.0.1:3184=BookieInfoImpl(rack=rack2, hostname=127.0.0.1:3184)}"

Verify Isolation Namespace by Bookie Affinity Group

Now that we have everything configured, let’s verify namespace isolation by the bookie affinity group setting.

1. Create a namespace in cluster1.

$PULSAR_HOME/cluster1/broker1/bin/pulsar-admin --admin-url http://localhost:8080 namespaces create -b 30 -c cluster1 public/c1-ns1

2. Set a bookie affinity group for the namespace.

$PULSAR_HOME/cluster1/broker1/bin/pulsar-admin --admin-url http://localhost:8080 namespaces set-bookie-affinity-group public/c1-ns1 \
--primary-group group-bookie1

3. Check the bookie affinity group of the namespace.

$PULSAR_HOME/cluster1/broker1/bin/pulsar-admin --admin-url http://localhost:8080 namespaces get-bookie-affinity-group public/c1-ns1

4. Produce some messages to a topic of the namespace public/c1-ns1.

# set retention for namespace `public/c1-ns1` to avoid messages were deleted automatically
cluster1/broker1/bin/pulsar-admin --admin-url http://localhost:8080 namespaces set-retention -s 1g -t 3d public/c1-ns1
$PULSAR_HOME/cluster1/broker1/bin/pulsar-client --url pulsar://localhost:6650 produce -m 'hello' -n 300 public/c1-ns1/t1

5. Check the internal stats of the topic.

$PULSAR_HOME/cluster1/broker1/bin/pulsar-admin --admin-url http://localhost:8080 topics stats-internal public/c1-ns1/t1

We should get a list of the ledgers in the topic. In this case it is ledgers 0, 2, and 3.

"ledgers" : [ {
    "ledgerId" : 0,
    "entries" : 100,
    "size" : 5400,
    "offloaded" : false,
    "underReplicated" : false
  }, {
    "ledgerId" : 2,
    "entries" : 100,
    "size" : 5616,
    "offloaded" : false,
    "underReplicated" : false
  }, {
    "ledgerId" : 3,
    "entries" : 100,
    "size" : 5700,
    "offloaded" : false,
    "underReplicated" : false
  } ]
  

Check the ensembles for each of the ledgers to confirm that the ledger was written to bookies that are part of group-bookie1.

$PULSAR_HOME/bk-cluster/bk1/bin/bookkeeper shell ledgermetadata -ledgerid 0
# check ensembles
ensembles={0=[127.0.0.1:3181, 127.0.0.1:3182]}

$PULSAR_HOME/bk-cluster/bk1/bin/bookkeeper shell ledgermetadata -ledgerid 2
# check ensembles
ensembles={0=[127.0.0.1:3182, 127.0.0.1:3181]}

$PULSAR_HOME/bk-cluster/bk1/bin/bookkeeper shell ledgermetadata -ledgerid 3
# check ensembles
ensembles={0=[127.0.0.1:3182, 127.0.0.1:3181]}

6. Repeat these steps in cluster2 so that we can isolate cluster1’s namespaces from cluster2’s.

Migrate Namespace

Migrate Bookie Affinity Group

Now that we have verified namespace isolation, if the bookie group hasn’t enough space, we could migrate the bookie affinity group to a namespace.

1. Modify the bookie affinity group of the namespace.

$PULSAR_HOME/cluster1/broker1/bin/pulsar-admin --admin-url http://localhost:8080 namespaces set-bookie-affinity-group public/c1-ns1 --primary-group group-bookie2

2. Unload the namespace to make the bookie affinity group change take effect.

$PULSAR_HOME/cluster1/broker1/bin/pulsar-admin --admin-url http://localhost:8080 namespaces unload public/c1-ns1

3. Produce messages to the topic public/c1-ns1/t1 again.

$PULSAR_HOME/cluster1/broker1/bin/pulsar-client --url pulsar://localhost:6650  produce -m 'hello' -n 300 public/c1-ns1/t1

4. Check ensembles for new added ledgers. We should see that a new ledger was already added in group-bookie2.

$PULSAR_HOME/cluster1/broker1/bin/pulsar-admin --admin-url http://localhost:8080 topics stats-internal public/c1-ns1/t1
  "ledgers" : [ {
    "ledgerId" : 0,
    "entries" : 100,
    "size" : 5400,
    "offloaded" : false,
    "underReplicated" : false
  }, {
    "ledgerId" : 2,
    "entries" : 100,
    "size" : 5616,
    "offloaded" : false,
    "underReplicated" : false
  }, {
    "ledgerId" : 3,
    "entries" : 100,
    "size" : 5700,
    "offloaded" : false,
    "underReplicated" : false
  }, {
    "ledgerId" : 15,
    "entries" : 100,
    "size" : 5400,
    "offloaded" : false,
    "underReplicated" : false
  }, {
    "ledgerId" : 16,
    "entries" : 100,
    "size" : 5616,
    "offloaded" : false,
    "underReplicated" : false
  }, {
    "ledgerId" : 17,
    "entries" : 100,
    "size" : 5700,
    "offloaded" : false,
    "underReplicated" : false
  }]
  

Let’s check the ensembles for new added ledgers (15, 16, 17) to confirm that the ledger was written to bookies that are part of group-bookie2.

$PULSAR_HOME/bk-cluster/bk1/bin/bookkeeper shell ledgermetadata -ledgerid 15
# check ensembles
ensembles={0=[127.0.0.1:3184, 127.0.0.1:3183]}

$PULSAR_HOME/bk-cluster/bk1/bin/bookkeeper shell ledgermetadata -ledgerid 16
# check ensembles
ensembles={0=[127.0.0.1:3183, 127.0.0.1:3184]}

$PULSAR_HOME/bk-cluster/bk1/bin/bookkeeper shell ledgermetadata -ledgerid 17
# check ensembles
ensembles={0=[127.0.0.1:3183, 127.0.0.1:3184]}

Scale up and down Bookies

Eventually our data volume will grow beyond the capacity of our BookKeeper cluster, and we will need to scale up the number of bookies. In this section we will show you how to add a new bookie and assign it to an existing bookie affinity group.

Scale up

1. Start a new bookie node bk-5.

cp -r apache-pulsar-2.8.1 bk-cluster/bk5
$PULSAR_HOME/bk-cluster//bk-cluster/bk5/bin/pulsar-daemon start bookie

2. Add the newly added bookie node to group-bookie1.

$PULSAR_HOME/cluster1/broker1/bin/pulsar-admin --admin-url http://localhost:8080 bookies set-bookie-rack \
--bookie 127.0.0.1:3185 \
--hostname 127.0.0.1:3185 \
--group group-bookie2 \
--rack rack2

3. Check bookie racks placement.

$PULSAR_HOME/cluster1/broker1/bin/pulsar-admin --admin-url http://localhost:8080  bookies racks-placement
"group-bookie1    {127.0.0.1:3181=BookieInfoImpl(rack=rack1, hostname=127.0.0.1:3181), 127.0.0.1:3182=BookieInfoImpl(rack=rack1, hostname=127.0.0.1:3182)}"
"group-bookie2    {127.0.0.1:3183=BookieInfoImpl(rack=rack2, hostname=127.0.0.1:3183), 127.0.0.1:3184=BookieInfoImpl(rack=rack2, hostname=127.0.0.1:3184), 127.0.0.1:3185=BookieInfoImpl(rack=rack2, hostname=127.0.0.1:3185)}"

4. Unload namespace public/c1-ns1 to make the bookie affinity group change take effe

$PULSAR_HOME/cluster1/broker1/bin/pulsar-admin --admin-url http://localhost:8080 namespaces unload public/c1-ns1

5. Produce some messages to the topic public/c1-ns1/t1 again.

$PULSAR_HOME/cluster1/bin/pulsar-client --url pulsar://localhost:6650 produce -m 'hello' -n 300 public/c1-ns1/t1

6. Check the newly added ledger of the topic public/c1-ns1/t1.

$PULSAR_HOME/cluster1/broker1/bin/pulsar-admin --admin-url http://localhost:8080 topics stats-internal public/c1-ns1/t1
$PULSAR_HOME/bk-cluster/bk1/bin/bookkeeper shell ledgermetadata -ledgerid ledgerid

We can see that the newly added ledger now exists in the newly added bookie node.

Scale down

In a distributed system, it is not uncommon for an individual component to fail. In this section, we will simulate the failure of one of the bookies and demonstrate that the shared BookKeeper cluster is able to tolerate the failure event. You could also refer to https://bookkeeper.apache.org/docs/4.14.0/admin/decomission/ for a detailed example.

1. Make sure there are enough bookies in the affinity group.

For example, if the configuration managedLedgerDefaultEnsembleSize of the broker is 2, then after we scale down the bookies we should have at least 2 bookies belonging to the affinity group.

We can check the bookie rack placement.

$PULSAR_HOME/cluster1/broker1/bin/pulsar-admin --admin-url http://localhost:8080 bookies racks-placement

2. Delete the bookie from the affinity bookie group.

$PULSAR_HOME/cluster1/broker1/bin/pulsar-admin --admin-url http://localhost:8080 bookies delete-bookie-rack -b 127.0.0.1:3185

3. Check if there are under-replicated ledgers, which should be expected given the fact that we have deleted a bookie.

$PULSAR_HOME/bk-cluster/bk1/bin/bookkeeper shell listunderreplicated

4. Stop the bookie.

$PULSAR_HOME/bk-cluster/bk5/bin/pulsar-daemon stop bookie

5. Decommission the bookie.

$PULSAR_HOME/bk-cluster/bk1/bin/bookkeeper shell decommissionbookie -bookieid 127.0.0.1:3185

6. Check ledgers in the decommissioned bookie.

$PULSAR_HOME/bk-cluster/bk1/bin/bookkeeper shell listledgers -bookieid 127.0.0.1:3185

7. List the bookies.

$PULSAR_HOME/bk-cluster/bk1/bin/bookkeeper shell listbookies -rw -h

What’s Next

We have shown you how to achieve isolation with two Puslar clusters sharing one BookKeeper. You can deploy multiple Pulsar clusters following the same steps. Stay tuned for the last blog in this series where we show you how to achieve isolation with a single Pulsar cluster!

Meanwhile, check out the Pulsar resources below:

  1. Take the 10-minute 2022 Apache Pulsar User Survey now to help the Pulsar community improve the project.
  2. Get your free copy of Manning's Apache Pulsar in Action by David Kjerrumgaard.
  3. Join the 2022 StreamNative Ambassador Program and work directly with Pulsar experts from StreamNative to co-host events, promote new project updates, and build the Pulsar user group in your city.
  4. Join the Pulsar community on Slack.

Ran Gao
Ran Gao is a software engineer at StreamNative. Before that, he was responsible for the development of search service at Zhaopin.com. Prior to that, he worked on the development of the logistics system at JD Logistics. Being interested in open source and messaging systems, Ran is an Apache Pulsar committer.

Related articles

Oct 30, 2024
10 min

Announcing the Ursa Engine Public Preview for StreamNative BYOC Clusters

Oct 30, 2024
15 min

Introducing Universal Linking: Revolutionizing Data Replication and Interoperability Across Data Streaming Systems

Newsletter

Our strategies and tactics delivered right to your inbox

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Multi-Tenancy & Isolation
Pulsar Tutorials