This is Part 2 of a two-part series in which we share our perspectives on Pulsar vs. Kafka. In Part 1, we compared Pulsar and Kafka from an engineering perspective and discussed performance, architecture, and features. In Part 2, we aim to provide a broader business perspective by sharing insights into Pulsar's rapidly growing popularity.
Data is transforming the business landscape with major industry leaders like Amazon, Uber, and Netflix demonstrating how access to real-time data, data messaging, and processing capabilities can translate to better products and customer experiences, disrupt entire industries, and generate billions in revenue. This need for real-time insights across industries is driving adoption and innovation in the messaging space.
As companies look to adopt real-time streaming solutions for new, innovative applications and to improve their existing systems, business leaders are seeking to better understand the respective advantages and disadvantages associated with the top technologies in the space, namely Pulsar, Kafka, and RabbitMQ.
Today, companies' messaging needs are increasingly complex and many organizations require a more comprehensive solution than RabbitMQ or Kafka can provide on their own. While RabbitMQ is best suited for message queueing and Kafka can manage data pipelines, Pulsar can accomplish both.
Companies that have a need for both types of messaging are increasingly choosing Pulsar for its flexibility, scalability, and ability to simplify operations by delivering multiple messaging functions on the same platform. Pulsar provides unique, sought-after capabilities, such as unified messaging and the ability to build streaming-first applications, which are powering some of today's most advanced companies.
However, because Pulsar is a younger technology, some are less familiar with its capabilities. In this post, we will address some common misconceptions about Pulsar and show Pulsar's growing popularity as evidenced by its rapid growth in adoption, an increase in the number and variety of use cases, and its ever-expanding community. We will also address the risks associated with adopting a new technology and explain why maintaining the status quo presents the risk of being left behind in a quickly changing landscape.
We have chosen to frame our discussion around commonly asked questions.
To provide some insight into Pulsar's maturity and real-world use cases, we'll start with a brief background on its origin and development.
Pulsar's development began within Yahoo in 2012. It was committed to open source in 2016 and became a top-level Apache project in 2018. It has enterprise support from StreamNative. Pulsar enjoys several advantages as a newer entrant into the messaging space. Specifically, its developers at Yahoo had worked on Kafka and other traditional messaging technologies previously and knew the shortcomings associated with these platforms first-hand. As a result, they designed Pulsar with some distinct advantages that make it easier to operate as well as to provide features - such as unified messaging and tiered storage - which introduce new capabilities that are well-suited for emerging use cases.
By comparison, Kafka originated within LinkedIn. It was committed to open source in 2011 and became a top-level Apache project in 2012. As the first major event-streaming platform on the market, it is widely recognized and widely adopted. Kafka receives enterprise support from a number of companies, including Confluent. Compared to Pulsar, Kafka is a more mature technology that is popular, has a bigger community, and a more advanced ecosystem.
Pulsar has seen tremendous growth, particularly over the past 18 months. It has been adopted by a growing list of global media companies, technology companies, and financial institutions. Below are examples of significant enterprise-level use cases that illustrate Pulsar's ability to handle mission-critical applications.
Tencent's adoption of Pulsar for their transactional billing system, Midas, demonstrates Pulsar's ability to handle mission-critical applications and provides compelling evidence that the technology has been rigorously tested and performs well in demanding environments. Midas operates at a massive scale, processing more than 10 billion transactions and 10+ TBs of data daily. The billing system is a critical piece of infrastructure for a company with over $50 billion in annual revenue.
Verizon Media provides another compelling use case, having successfully operated Pulsar in production for over five years. Verizon Media, via its acquisition of Yahoo, is the original developer of Pulsar. In their recent Pulsar Summit talk, Joe Francis and Ludwig Pummer of Verizon Media described Pulsar as a "battle-tested" system that is being used throughout the Verizon Media landscape. They shared that Pulsar routinely handles up to 3 million write requests/second on more than 2.8 million distinct topics. Pulsar has satisfied Verizon Media's need for a low-latency, highly available system that can be scaled easily and has the ability to support a business that operates across six global data centers.
Another key adoption story comes from Splunk, a company that has used Kafka in production environments for years. During a recent Pulsar Summit talk, "Why Splunk Chose Pulsar", Karthik Ramasamy shared Splunk's reasons for choosing Pulsar to power its next-generation analytics product, Splunk DSP, which handles billions of events per day. Ramasamy explained that Pulsar was able to meet 18 key requirements and cited its ease of scalability, lower operating costs, better performance, and strong open-source community as major factors in their decision to adopt Pulsar.
The above use cases clearly demonstrate that Pulsar is a powerful solution that many industry leaders are choosing to power critical business infrastructure. Although Kafka is more mature and more widely used, Pulsar's rapid rate of adoption is evidence of its strong capabilities and readiness for mission-critical use cases.
While major technology and media companies, such as Uber and Netflix, have been able to successfully build unified batch and stream processing and streaming-first applications to power their real-time data needs, most companies lack the vast engineering and financial resources these applications typically require. However, Pulsar offers advanced messaging capabilities that enable companies to overcome many of these challenges.
Below, we highlight three unique capabilities - some current and others still in development - that distinctly set Pulsar apart from its competitors.
Two of the most common types of messaging used today are application messaging (traditional queuing systems) and data pipelines. Application messaging is used to enable asynchronous communications (often developed on platforms such as RabbitMQ, AMQP, JMS, among others), while data pipelines are used to move high volumes of data between different systems (such as Apache Kafka or AWS Kinesis). Because these two types of messaging are performed on different systems and serve different functions, companies often need to operate both. Developing and managing separate systems is not only expensive and complex, but can also make it difficult to integrate systems and centralize data.
Pulsar's core technology gives users the ability both to deploy it as a traditional queuing system and use it in data pipelines, uniquely positioning Pulsar as the ideal platform to provide unified messaging capabilities. Unified messaging makes it easier for organizations to capture and distribute their data, which facilitates the use of real-time data to drive business innovation.
Pulsar also recently added tools - Kafka-on-Pulsar (KoP) and AMQP-on-Pulsar (AoP) - that make it even easier for companies to leverage these unified messaging capabilities. (We discuss KoP and AoP in more detail below.)
Because companies today need to be able to make timely decisions and react to change quickly, the need for real-time, meaningful data has never been more critical. At the same time, it is crucial to be able to integrate and understand large amounts of historical data in order to gain a complete picture of a business.
Traditional Big Data systems (such as Hadoop) facilitate decision-making by allowing organizations to analyze massive historical data sets. However, as these systems can take minutes, hours, or even days to process data, they struggle to integrate real-time data and the results they produce are often of limited value.
Stream processors, such as Kafka Streams, are adept at processing streaming data and computing answers closer to real-time, but are not a good fit for processing large historical datasets. Many organizations need to run both batch and streaming data processors in order to gain the insights they need for their business. However, maintaining multiple systems is expensive and each system has its own respective challenges.
More recently, systems have been developed which can do both batch and stream processing. Apache Flink is one example. Currently, Flink is used for stream processing with both Kafka and Pulsar. However, Flink's batch capabilities are not particularly compatible with Kafka as Kafka is only able to deliver data in streams, making it too slow for most batch workloads.
By contrast, Pulsar's tiered storage model provides the batch storage capabilities needed to support batch processing in Flink. In the near future, Flink's batch processing capabilities will be integrated with Pulsar, enabling companies to query both historical and real-time data quickly and more easily, unlocking a unique competitive advantage.
Web application development is in the midst of a major transformation as companies look to develop more sophisticated software. The traditional application model that pairs a single monolithic application with a large SQL database is giving way to applications composed of many, smaller components, or "microservices."
Many organizations are now adopting microservices because they offer greater flexibility to meet changing business needs and help facilitate development across growing engineering teams. However, microservices introduce new challenges, such as the need to enable communication among various components and keep them synchronized.
With a newer microservices technique called "event sourcing," applications produce and broadcast streams of events into a shared messaging system which captures the event history in a centralized log. This improves the flow of data and helps keep applications in sync.
But event sourcing can be difficult to implement as it requires both traditional messaging capabilities and the ability to store event history for long periods of time. While Kafka is capable of storing streams of events for days or weeks, event sourcing typically requires longer retention times. This added challenge often requires users to build multiple tiers of Kafka clusters to manage the growth of event data, plus additional systems to manage and track data collectively.
By contrast, Pulsar's unified messaging model is a natural fit, as it can easily distribute events to other components and effectively store event streams for indefinite periods of time. This unique design feature makes Pulsar especially attractive to companies looking to acquire dynamic, streaming-first capabilities.
While unified messaging, combined batch and event-streaming storage, and a "streaming-first" approach might be feasible to achieve with other systems, these features would be complex to implement and would require a great deal of effort and investment. In contrast, Pulsar's design includes all of these features, enabling users to adapt to the changing technology landscape easily and with far less complexity.
A snapshot comparison of the Pulsar and Kakfa communities today reflects that Kafka's is larger overall, with more Slack users and more stack overflow questions. While Pulsar's community is currently smaller, it is highly engaged and rapidly growing. Below are some highlights of its recent momentum.
In June, Pulsar held its first global event - the Pulsar Summit Virtual Conference 2020. The event featured more than 30 speaker sessions from Pulsar's top contributors, thought leaders, and developers. We heard real-world Pulsar adoption stories and received insights from companies such as Verizon Media, Splunk, Iterable, and OVHcloud.
With more than 600 sign-ups - including attendees from top internet, technology, and financial institutions such as Google, Microsoft, AMEX, Salesforce, Disney, and Paypal - the event revealed a highly engaged and global Pulsar community and demonstrated that interest in Pulsar is burgeoning.
In fact, the global Pulsar community subsequently asked us to host dedicated regional events in Asia and Europe soon. To meet this growing demand, we have scheduled Pulsar Summit Asia 2020 in October and are currently planning Pulsar Summit Europe.
In addition to facilitating large, widely attended summits, the Pulsar community is focusing on interactive training and online events. For example, earlier this year, the community, led by StreamNative, launched a weekly live-streaming, interactive tutorial called TGIP (Thank Goodness It's Pulsar) that provides technology updates and hands-on tutorials highlighting various operational aspects. TGIP sessions are available on YouTube and StreamNative.io and are helping to augment Pulsar's growing knowledge base.
In 2020, the Pulsar community also launched monthly webinars to share best practices, new use cases, and technology updates. Recent webinars have been hosted by strategic commercial and open-source partners such as OVHCloud, Overstock, and Nutanix. On July 28th, StreamNative will be hosting Operating Pulsar in Production as a panel discussion with additional participants from Verizon Media and Splunk.
Pulsar's ecosystem has further evolved with the expansion of professional training, which is available through StreamNative and other partners. In fact, Pulsar and Kafka expert Jesse Anderson recently led an in-depth training session on Developing Pulsar Applications. Professional training sessions help to enlarge the pool of Pulsar-trained engineers and allow Pulsar users to accelerate their messaging and streaming platform development initiatives.
In addition, an increase in the publication of whitepapers is helping to expand Pulsar's knowledge base.
Committed community partners have also contributed to key project advancements. Below, we look at two recent product launches.
In March 2020, OVHCloud and StreamNative launched Kafka-on-Pulsar (KoP), the result of the two companies working closely in partnership. KoP enables Kafka users to migrate their existing Kafka applications and services to Pulsar without modifying the code. Although only recently released, KoP has already been adopted by several organizations and is being used in production environments. Moreover, KoP's availability is helping to expand Pulsar's adoption.
In June 2020, China Mobile and StreamNative announced the launch of another major platform upgrade, AMQP on Pulsar (AoP). Similar to KoP, AoP allows organizations currently using RabbitMQ (or other AMQP message brokers) to migrate existing applications and services to Pulsar without code modification. Again, this is a key initiative that will help drive the adoption and usage of Pulsar.
The events and initiatives described above illustrate the Pulsar community's firm commitment to education and ecosystem development. More importantly, they demonstrate the momentum and growth we can expect in the future.
In today's ever-changing business landscape, access to data can unlock innovative business opportunities, define new categories, and propel companies ahead of the competition. As a result, organizations are increasingly seeking to leverage their data and the insights that can be gained from it to develop competitive advantages, and they are seeking new technologies to help them achieve these goals.
In this post, we set out to address some common business concerns organizations face when evaluating a new technology. These include the technology's proven capabilities, its ability to enable in-demand business use cases, and, in the case of open-source technologies, the size and level of engagement within the project's community.
The Tencent, Verizon Media, and Splunk use cases described earlier demonstrate Pulsar's ability to deliver mission-critical applications in the real world. Beyond its proven capabilities, Pulsar's ability to deliver unified messaging and streaming-first applications provides a marked advantage by enabling organizations to build disruptive, competitive technologies without requiring extensive resources. Pulsar's integration with Flink, which is currently in development, will provide yet another competitive advantage: the ability to perform both batch and stream processing on the same platform.
While the Pulsar community and a few other key areas, such as documentation, are still small, their growth has increased considerably in the past 18 months. Pulsar's highly engaged and quickly growing community and ecosystem are committed to contributing to the ongoing expansion of Pulsar's knowledge base and training materials, while also accelerating the development of key capabilities.
Disruption can happen quickly and organizations evaluating any technology need to consider not only the strengths and weaknesses it has today, but also how the technology will continue to grow and evolve to meet business needs in the future. The combination of Pulsar's enhanced messaging offering and unique capabilities make it a strong alternative that should be considered by any company looking to develop real-time data streaming capabilities.
For a deeper dive into Pulsar vs. Kafka — A More Accurate Perspective on Performance, Architecture, and Features, please read Part 1 of this series here.
We encourage you to sign up for the Pulsar Newsletter to stay up-to-date on upcoming events and technology updates. If you would like to chat with current Pulsar users, you can join the Pulsar Slack Channel.
And don't forget to join our webinar, Operating Pulsar in Production, on Tuesday, July 28th at 10 am. This will be a highly interactive roundtable discussion with additional participants from Verizon Media, Splunk, and StreamNative.
We would like to thank the many members of the Pulsar community who contributed to this article - especially, Jerry Peng, Jesse Anderson, Joe Francis, Matteo Merli, Sanjeev Kulkarni, and Addison Higham.