Since its inception, Apache Kafka has been widely recognized for its robust data streaming capabilities, making it the go-to solution for numerous companies handling real-time data. However, Kafka’s architecture has its own limitations, including issues with scalability, rebalancing, node failure management, cloud-native compatibility, and jitter. In light of these challenges, organizations using Kafka are exploring alternative systems in the streaming space, such as Apache Pulsar.
Pulsar has been making waves in the messaging and streaming domain. Although Pulsar’s creation was inspired by Kafka’s classic architecture, and it shares familiar concepts like topics and brokers, it adopts an entirely different approach to managing computing and storage. Born for the cloud-native era, Pulsar features a decoupled architecture, which allows for independent scaling of its computing and storage layers. This innovative design effectively solves some of the key issues experienced by Kafka users. Moreover, Pulsar is designed natively with a suite of enterprise-grade features, including geo-replication, multi-tenancy, and tiered storage, positioning Pulsar as an attractive alternative to Kafka users.
Nevertheless, Kafka has been the major solution for a long time for many organizations and their applications are already bound with it. They might be reluctant to make the migration due to different organizational, operational, or technical considerations.
This raises an interesting question: Is there a way for organizations to keep using their Kafka applications without major changes while leveraging Pulsar’s infrastructure and superior messaging and streaming technology?
Pulsar features a protocol handler mechanism that allows teams to leverage the best of both worlds. StreamNative has implemented the Kafka wire protocol by leveraging the existing components (for example, topic discovery, the distributed log library - ManagedLedger, and cursors) that Pulsar already has. StreamNative Cloud, which provides fully managed Pulsar services in the cloud, has a built-in Kafka protocol with enterprise features. It enables teams to take advantage of Pulsar’s distinct features such as multi-tenancy and tiered storage while continuing to use their existing Kafka applications.
Futureproof Kafka applications with Pulsar
The most important benefit of the Kafka protocol on StreamNative Cloud is that it allows organizations to harness the strengths of both systems without disrupting their legacy Kafka applications. With a unified event streaming platform, they can take advantage of the following features that Pulsar has to offer.
- Unified streaming and queuing
- Streamlined operations with enterprise-grade multi-tenancy
- Enhanced scalability and elasticity with a rebalance-free architecture
- Infinite data retention with Apache BookKeeper and tiered storage
Now, let’s take a closer look at each of them by understanding how Pulsar can help solve some of the key pain points for Kafka.
Unified streaming and queuing
Pulsar can be used to handle both real-time streaming scenarios like Kafka as well as traditional message queues like RabbitMQ or ActiveMQ. With the Kafka protocol on StreamNative Cloud, organizations maintaining multiple systems for different use cases can manage streaming and messaging semantics in a single platform.
This ability is embodied in Pulsar's four subscription types (Exclusive, Shared, Failover, and Key_Shared) and selective acknowledgment of messages. The former defines how messages are sent to the consumers of a topic. As a single topic can have multiple different subscriptions, that topic can be used to serve both queueing and messaging use cases. The latter means that you can use Pulsar to acknowledge messages individually. This is where Kafka falls short as it only allows you to commit a batch of messages by a given offset (Pulsar supports cumulative acknowledgment as well).
Note that Pulsar’s protocol handler mechanism allows brokers to dynamically load protocol handlers on runtime, including not just the Kafka protocol, but also the MQTT and AMQP protocols. They can be enabled at the same time while working independently of each other.
In many organizations using Kafka, different teams are self-managing Kafka and have a decentralized structure where each application team manages its own Kafka cluster (and probably its Kubernetes cluster) with the help of the platform or data team. This might cause problems in terms of data governance, access control, data replication, as well as costs. For example, an organization must run a Kafka cluster for each use case or team to avoid sharing data, and each cluster needs to be overprovisioned to avoid downtime and ensure that there are enough resources; a single message might generate hundreds of events since it must be replicated for all the different clusters, applications and teams. All of these problems result from the lack of multi-tenancy in Kafka.
Note: It is possible to use multi-tenancy in Kafka with a paid solution such as Conduktor Gateway, while it is more expensive and users can have a vendor lock-in issue.
Different from Kafka, Pulsar is designed as a multi-tenant system from the ground up. It features a three-level hierarchy of tenants, namespaces, and topics, offering an effective access control mechanism.
- Tenants provide a security boundary. Different teams of an organization can have their own tenant.
- Namespaces allow teams to keep their data separate from each other and support custom policies, such as data retention and storage quotas.
- Topics are named channels under namespaces for transmitting messages from producers to consumers.
Pulsar’s multi-tenancy allows for the segregation and independent processing of data streams in large-scale applications. For more information, download our eBook Multi-Tenancy and Isolation: Scaling Real-Time Data Across Teams with Apache Pulsar.
Unlimited data storage
One of the key benefits of the Kafka protocol on StreamNative Cloud lies in Pulsar’s ability to achieve unlimited data storage.
In Kafka, the storage capacity is tied to the leader node and local disk partitions, making it difficult to scale (stateful brokers bring great difficulty in rebalancing). As a result, reaching the maximum storage capacity can hinder the acceptance of new messages. If you choose to scale up Kafka brokers or prepare a large storage cluster, you will end up with a costly infrastructure setting.
Our Kafka protocol on StreamNative Cloud allows you to persist data for longer periods by leveraging BookKeeper, which supports data persistence outside of Pulsar brokers. BookKeeper’s storage servers, also known as bookies, can be independently scaled. This means you can expand your cluster without worrying about storage limits to easily accommodate growing data workloads.
Another important feature that Pulsar natively offers to make unlimited data storage possible is tiered storage. You can store cold data in cheaper storage for extended periods based on your business needs. By contrast, Kafka does not provide tiered storage natively. Kafka brokers require high-performance disks for both writing and reading, which can be expensive. Vendors like Confluent do provide tiered storage for Kafka, while it also means you are vendor-locked.
Enhanced scalability and elasticity
The biggest pain point in Kafka might be its scaling difficulty. When a Kafka broker goes down, a newly added broker cannot immediately serve the requests sent to the failed broker. You need to manually migrate the old partition and this process can be a nightmare.
By contrast, Pulsar separates computing from storage, allowing for vertical and horizontal scaling of both its processing and storage nodes. With a more flexible architecture than Kafka, Pulsar allows brokers and bookies to be scaled independently, and each layer does not even need to know what happens on either side.
In terms of elasticity, Kafka requires you to carefully plan in advance how many partitions and broker nodes are required in the cluster. With the Kafka protocol on StreamNative Cloud, you can enjoy better elasticity at the following three levels:
- Consumer: You don’t need to perform topic repartitioning to add a consumer.
- Processing: Pulsar brokers are stateless and you can add a broker as needed.
- Storage: Bookies can handle requests immediately after added and you can offload data to external storage without adding nodes.
The above-mentioned key benefits speak volumes about the cost-effectiveness of the Kafka protocol on StreamNative Cloud. Specifically, it helps save costs in the following ways:
- Unified streaming and messaging. As a two-in-one system, Pulsar frees you from maintaining another queueing system, with less infrastructure overhead. StreamNative Cloud also supports other protocols such as MQTT and AMQP, which can be used with the Kafka protocol at the same time.
- Multi-tenancy. Multiple users or applications can share the same Pulsar cluster while being isolated from each other. Cluster operators do not need to create separate clusters for each tenant as they can share the same resources, thus reducing infrastructure costs.
- Scalability and elasticity. As Pulsar supports independent scaling of both brokers and bookies, you only need to pay for the nodes that are working to accommodate the real-time workloads. Pulsar’s great scalability and elasticity also mean better resource utilization.
- Tiered storage. As mentioned above, Pulsar’s architecture allows for more efficient storage of messages than Kafka. With tiered storage, less frequently accessed data can be offloaded to cheaper storage systems, such as AWS S3 and Google Cloud Storage. This greatly reduces storage costs for large data sets.
Smooth ecosystem integration
One of the advantages that Kafka provides over Pulsar is its rich ecosystem of tools. Kafka has established itself as an industry standard, offering a wide range of connectors and client libraries that facilitate seamless integration with popular data streaming, processing, and database frameworks. Currently, Pulsar supports connectors for systems like Spark, Flink, and Elasticsearch, while there are more available connectors in the Kafka ecosystem.
The Kafka protocol on StreamNative Cloud enables you to use Kafka connectors to implement more Pulsar connectors (for example, the Pulsar-Druid connector used by engineers at Nutanix). This way, organizations can continue to use their existing Kafka connectors, and other ecosystem tools without major modifications. This compatibility allows for a more smooth transition, minimizing the learning curve and empowering organizations to leverage their existing investments in Kafka.
The above-mentioned benefits are only part of the major differentiators. As organizations grow Pulsar adoption across different teams and use cases, they can use the Kafka protocol on StreamNative Cloud as a gateway to drive Pulsar adoption and embrace more native benefits and features that Pulsar has to offer.
Real-world use cases
By integrating two popular event-streaming ecosystems, teams can harness the unique benefits of each ecosystem and build a unified event streaming platform with Pulsar to accelerate the development of real-time applications and services.
- Real-time Fraud Detection: In the financial industry, teams can use data from legacy Kafka applications to detect fraudulent activities in real time. Transaction data from multiple sources, such as credit card transactions and online payments, can be ingested into Pulsar. Stream processing applications can analyze the data in real time, identify suspicious patterns, and trigger alerts or take actions to prevent fraud.
- Supply Chain Optimization: By streaming data from different stages of the supply chain, such as inventory systems, logistics providers, and point-of-sale systems, organizations can gain real-time visibility into their supply chain operations. This allows them to proactively identify bottlenecks, optimize inventory levels, and improve overall efficiency.
- Gaming Telemetry and Analytics: In the gaming industry, the Kafka protocol on Pulsar can be utilized for collecting and processing telemetry data from games and game servers. This data can include player actions, game events, and performance metrics. Real-time analytics can be performed to monitor player behavior, identify cheating or hacking attempts, and optimize game balancing and monetization strategies.
The Kafka protocol is now available on StreamNative Cloud, which delivers fully managed Apache Pulsar in the cloud of your choice. It offers three deployment options to easily and safely connect to your existing tech stack with StreamNative’s reliable, turnkey service. To get started, follow the instructions to use the Kafka protocol on StreamNative Cloud.
The Kafka protocol on StreamNative Cloud brings together the best of both Kafka and Pulsar, providing a powerful solution for modern data streaming needs. It unlocks new opportunities for data-driven innovation as organizations can continue to use legacy Kafka applications with the benefits that Pulsar has to offer. Furthermore, they can use it as a gateway for more Pulsar adoption without disrupting their Kafka applications.
Pulsar has become one of the most active Apache projects over the past few years, with a vibrant community driving innovation and improvements to the project. Check out the following resources to learn more about Pulsar.