Pulsar User Survey 2021 Highlights
June 11, 2021
As noted in the 2021 Apache Pulsar User Survey Report, Apache Pulsar adoption and community engagement skyrocketed over the past year.
Key trends driving Pulsar adoption include the move to containers and cloud strategies, the need to solve for unprecedented scale and management complexity, the pivot from a pure streaming workload to unified batch and streaming workloads, and the need to unlock new use cases.
Pulsar’s cloud-native capabilities, unified messaging and streaming, scalability and reliability, and super-set of built-in features that enable new use cases and streamline operations make it uniquely positioned to meet many of today’s emerging needs.
In this report, we look at the key takeaways from the 2021 Apache Pulsar User Report.
- Pulsar in Production and at Scale
- Kafka Users Adopt Pulsar
- Cloud Native Initiatives and K8s Drive Pulsar Adoption
- Pulsar + Flink: Pulsar Continues to Innovate
Below we take a look at each of these highlights in more detail.
1. Pulsar in Production and at Scale
The two most important takeaways from the Pulsar User Survey 2021 are:
- The growth in the number of companies using Pulsar in production.
- The growth in the number of companies using Pulsar at enterprise scale.
While the increase in Pulsar adoption is significant, the increase in production deployments has seen the most meaningful growth (see graph above). The 2021 Survey Report reveals that 51% of respondents were using Pulsar in production, compared to 31% the year prior. The increase in production use cases demonstrates Pulsar's ability to deliver mission-critical applications in the real world.
Pulsar at Scale
Question: How many messages does your organization process with Pulsar every day? Response: 12% of the respondents process over one trillion messages per day.
Pulsar has also seen an increase in the number of large scale, enterprise deployments. 12% of respondents shared that their organization processes more than 1 trillion messages per day using Pulsar. Tencent, Splunk, Newland Digital Technology Co Ltd, Kingsoft Cloud, and Pactera are just a handful of the companies who are using Pulsar to process more than 1 trillion messages per day.
The increase in companies running Pulsar at a large scale illustrates its ability to meet the scalability, reliability, and flexibility needs of companies today. Notably, Pulsar is meeting the needs of companies seeking a unified messaging and streaming platform.
2. Kafka Users Adopt Pulsar
Question: What other message queues does your organization use in addition to Pulsar? Response: 68% of respondents use Kafka in addition to Pulsar. Question: If you use connectors, which connectors do you use or plan to use for Pulsar? Response: 34% of respondents said Kafka on Pulsar (KoP)
Kafka Users Adopt Pulsar
A major insight from the user survey is the number of Kafka users who are adopting Pulsar. 68% of respondents said that they use Kafka in addition to Pulsar. Given Kafka is an older and more widely adopted technology, we can infer that these are companies who were already using Kafka and then decided to adopt Pulsar (versus Pulsar users who are adopting Kafka).
The figure below from API7(1), demonstrates the increase in Pulsar project engagement. Perhaps even more interesting, it shows that the Apache Pulsar community has surpassed Apache Kafka in monthly active contributors.
The 2021 survey also shows that more than one third of respondents use, or are planning to use, Kafka on Pulsar (KoP). KoP, which was launched in 2020, enables Kafka users to migrate their existing Kafka applications and services to Pulsar without modifying code.
KoP reduces barriers to Pulsar adoption for Kafka users and its popularity reveals that Kafka users are increasingly looking to Pulsar to solve problems and to enable use cases they are not able to achieve with Kafka.
Kafka and Pulsar Serve Different Use Cases
The high percentage of respondents (68%) using both Kafka and Pulsar may seem counterintuitive, as the technologies serve many of the same use cases. But, in fact, there are distinct differences in Pulsar and Kafka’s use cases and capabilities.
Kafka was built to support data pipelines and large scale data movement to centralized locations. Pulsar, by contrast, was created to serve both messaging and data streaming use cases that require handling more topics with complex topologies and sophisticated consumption models.
Pulsar’s built-in offering of multi-tenancy, geo-replication, and scalability enable new use cases and capabilities that Kafka cannot match. The top use cases are: (1) Message Queues, (2) Pub/Sub, (3) Data Pipelines, (4) Streaming Processing, (5) Microservices/Event Sourcing, (6) Data Integration, (7) Change Data Capture, and (8) Streaming ETL. This list demonstrates Pulsar’s ability to solve for a broader range of use cases.
Below we look at some Pulsar adoption stories from the past 12 months:
- A key Kakfa-to-Pulsar adoption story comes from Splunk, a company that used Kafka in production environments for years. At the Pulsar Summit 2020, Karthik Ramasamy shared details on Splunk's decision to adopt Pulsar for the Splunk DSP, an analytics product which handles billions of events per day. You can find the full details in this video on "Why Splunk Chose Pulsar".
- Tencent adopted Pulsar to solve issues with scale and reliability. Pulsar was first adopted to power their billing platform, Midas, and then, Pulsar adoption spread to Tencent’s Federated Learning Platform and to Tencent’s Gaming Department, where it was used to replace Kafka for its logging pipeline. You can learn more about Tencent’s adoption of Pulsar here.
- Iterable is another example of Pulsar adoption spreading. Iterable first adopted Pulsar to replace one messaging system, RabbitMQ, and they are now in the process of using Pulsar to replace Kafka and Amazon SQS. You can read the full story here.
The survey report shows that once it is adopted, Pulsar adoption expands across organizations. Tencent and Iterable are just two examples of Pulsar adoption expanding across an organization. When asked, “Will your organization build more applications on Pulsar in 2021”? 66% said “Yes” and another 10% said “Under Consideration.” That means 76% of Pulsar adopters are considering or planning to expand their Pulsar adoptions.
3. Cloud Native Initiatives and K8s Drive Adoption of Pulsar
- 80% of Pulsar users deploy in a cloud environment
- 62% of Pulsar users deploy on Kubernetes
- 49% noted Pulsar’s “cloud native” capabilities as one of the top reasons they chose to adopt Pulsar
The adoption of Pulsar is being driven by a larger industry move to the cloud and Kubernetes. As part of this move, organizations are looking for technologies that run in the cloud, scale well, and can leverage and run well on top of Kubernetes.
Technologies with single tenant systems, monolithic architectures, and that lack geo-replication and multi-cloud capabilities are not able to meet the needs of modern data applications. As a result, companies are increasingly looking to adopt cloud-native technologies, like Pulsar, to meet their business needs.
The move to Kubernetes is not a simple lift and shift. This transition requires new development models, new ways of working, and is causing companies to re-evaluate how existing technologies will be deployed and managed in the cloud. For example, technologies such as Kafka, that were designed before Cloud was commonplace can be difficult to map to the capabilities of cloud and Kubernetes. These factors are leading companies to best-of-breed cloud-native technologies, including Pulsar.
4. Pulsar + Flink: Pulsar Continues to Innovate
Companies today are looking for a complete streaming solution and Pulsar’s integration with Flink is significant because it creates another differentiator for the Pulsar community. From the 2020 Survey to the 2021 Survey, the number of Pulsar + Flink use cases almost doubled. As noted above, the adoption of Pulsar is often driven by companies seeking the ability to achieve new use cases and the Pulsar + Flink integration is an example of this.
Stream processors, such as Kafka Streams, are adept at relatively simple processing of streaming data and computing answers close to real-time, but they are not a good fit for processing large historical datasets or datasets that require many joins and complex analysis. Many organizations need to run both batch and streaming data processors in order to gain the insights they need for their business, but maintaining multiple systems is expensive and complex.
More recently, systems have been developed which can do both batch and stream processing. Apache Flink is one example. Currently, Flink is used for stream processing with both Kafka and Pulsar. However, Flink's batch capabilities are not particularly compatible with Kafka as Kafka is only able to deliver data in streams, making it too slow for most batch workloads.
Pulsar's tiered storage model provides the batch storage capabilities needed to support batch processing in Flink. With Flink + Pulsar, companies are able to query both historical and real-time data quickly and easily, unlocking a unique competitive advantage.
(1) “Monthly Active Contributors.” API7, 10 Jun, 2021, https://www.apiseven.com/en/contributor-graph?chart=contributorMonthlyActivity&repo=apache/pulsar,apache/kafka