Native Apache Kafka Service Is Coming Soon to StreamNative Cloud. Join the waitlist and get $1,000 in credits.

Join Waitlist >
StreamNative Logo
VideoMay 29, 202535 min

Pulsar Cluster Migration Best Practices: In-Place vs. Geo-Replication Approaches

Unlock Instant Access

Complete the form to start watching.

Session Overview

Unlock best practices for migrating Apache Pulsar clusters with Max Xu. Learn strategies for data integrity, cost control, and real-time processing.

TL;DR

Apache Pulsar cluster migration is crucial for modern cloud environments, addressing issues like region switches or infrastructure upgrades. Max Xu outlines two primary migration strategies: in-place migration and geo-replication, each with distinct benefits and challenges. The session provides a comprehensive roadmap for executing migrations effectively, ensuring data integrity and cost control.

Opening

Imagine needing to shift a thriving Apache Pulsar cluster seamlessly across regions or infrastructures without disrupting its critical data flow. As cloud environments evolve, such migrations become inevitable, driven by factors like geographic shifts or transitioning from self-hosted solutions to managed services. Max Xu, a Staff Software Engineer at StreamNative, opens the discussion with this pressing need, setting the stage for a deep dive into best practices for successful Pulsar cluster migrations.

What You'll Learn (Key Takeaways)

  • In-Place Migration Strategy – This approach involves expanding and then shrinking the cluster, maintaining a single data environment, which simplifies client application interactions and retains the original message IDs.
  • Geo-Replication Strategy – It utilizes built-in features to replicate data across clusters, offering automated processes and backup functionalities without requiring network connectivity between clusters.
  • Real-World Application – StreamNative's in-place migration helped AWS customers reduce infrastructure costs and improve technical support, illustrating the practical benefits of well-executed migrations.
  • Operational Efficiency – Understanding the resource requirements and potential bottlenecks in each strategy is crucial for optimizing migration speed and minimizing performance impacts.

Q&A Highlights

Q: How does Pulsar's geo-replication compare to Kafka's MirrorMaker? A: Pulsar's geo-replication is built into the broker, requiring no additional components, whereas Kafka's MirrorMaker involves setting up external jobs to replicate data between clusters.

Q: What options exist if the Pulsar cluster is managed with no access to resources for in-place migration? A: Geo-replication can be utilized, and for migrating all old data, including offloaded data, manual setup of subscriptions and data replication may be necessary.

Q: How does geo-replication work between clusters running different metadata stores, like Zookeeper and Etcd? A: Geo-replication treats clusters as independent entities and replicates data using Pulsar's protocol without needing access to each other's metadata stores, ensuring compatibility.

Q: What happens if the target cluster has fewer resources compared to the source cluster? A: For in-place migration, insufficient resources can slow down auto-recovery in Bookkeeper, while in geo-replication, it can lead to backlogs building up in the source cluster, affecting performance. Increasing resources is advised to mitigate these issues.

About Speaker

Max Xu

Max Xu Software Engineer