Sep 25, 2024
10 min

Celebrating the 6th Anniversary of Apache Pulsar as a Top-Level ASF Project: A Journey of Innovation and Community

Matteo Meril
Co-Founder and CTO, StreamNative
Sijie Guo
Co-Founder and CEO, StreamNative

About ten years ago, the Pulsar team at Yahoo was tasked with creating a multi-tenant unified messaging and data streaming platform. The technology was developed to power hundreds of real-time business applications within Yahoo! and eventually became Apache Pulsar. Six years ago, Apache Pulsar graduated to become a Top-Level Project (TLP) within the Apache Software Foundation (ASF). This milestone marked the beginning of an incredible journey—one filled with technological innovation, community growth, and an ever-evolving vision for democratizing data streaming. Today, we celebrate not just this achievement but also the people, the technology, and the vision that continues to propel Apache Pulsar forward as a fast-growing community in the age of data streaming.

Pulsar’s Community Growth: Building Together

From its inception, Apache Pulsar was driven by a strong, collaborative community. Over the past six years, this community has blossomed into one of the most vibrant ecosystems in the Apache Software Foundation. (Pulsar was ranked as one of the top 5 most active projects in the ASF.) Developers, contributors, and companies from across the globe have united around Pulsar’s unique storage-and-compute-separation architecture and its unparalleled capabilities, such as a unified messaging model, geo-replication, multi-tenancy, and more, driving its adoption across industries ranging from Financial Services to Automotive, Marketing Technologies, E-Commerce & Retail, IoT, and beyond. We have witnessed many mission-critical businesses built around Apache Pulsar, from powering tens of billions of billing requests in one of the largest billing & payment systems to supporting hundreds of millions of players in online games to providing company-wide messaging and streaming platforms for enterprises and unicorns.

The Pulsar community has grown exponentially. What started with a small group of dedicated developers has now expanded to hundreds, even thousands, of contributors, with Pulsar user groups in major regions around the world. Many Fortune 100 enterprises, unicorns, startups, and developers alike have adopted Pulsar as the real-time messaging and streaming platform for mission-critical, real-time transactional workloads. This growth reflects not only the robustness of the technology but also the commitment of the community to advance the platform and make Pulsar accessible to all.

This community has contributed not only code but also knowledge through meetups, webinars, and events, ensuring that Pulsar is more than just a technology—it’s a movement. The success of Pulsar is a testament to the power of collaboration and open-source innovation.

From Kafka to Pulsar to Ursa: Shaping the Future of Data Streaming

Over the years, Apache Pulsar has evolved from a scalable messaging system into an open, multi-protocol data streaming platform. It has become the backbone of real-time data streaming architectures, empowering modern enterprises with mission-critical transactional workloads. Pulsar’s success can be attributed to its unique features across several key dimensions:

Cloud-Native Architecture: Pulsar’s multi-layered architecture separates compute from storage, enabling unmatched scalability, durability, and flexibility. This design eliminates the need for data rebalancing, allowing Pulsar to scale up to 1000x faster than other data streaming platforms.

Unified Messaging & Data Streaming: Pulsar remains the only system that seamlessly unifies message queuing and data streaming into a single model. It offers a flexible subscription model, allowing developers to store a single copy of data and consume it multiple times in various ways tailored to business needs. This unique queuing capability has empowered enterprises and unicorns to run their most mission-critical transactional workloads. Meanwhile, other platforms like Kafka are still in the early stages of trying to implement similar features, with production use still years away.

Multi-Tenancy: Pulsar is the first and only data streaming platform to natively support multi-tenancy from day one. This feature is vital for enterprises and unicorns to reduce the total cost of ownership (TCO) when managing their data streaming infrastructure. Please check out our guide to Evaluating the Infrastructure Costs of Apache Pulsar and Apache Kafka.

Oxia: Scalable Metadata & Coordination: One of Pulsar's newest innovations, Oxia, addresses the challenge of metadata scalability in large-scale environments. It provides a robust and scalable metadata and coordination layer, ensuring low-latency, high-performance coordination, and state management, which is essential for complex, distributed systems. See our blog post: Introducing Oxia: Scalable Metadata and Coordination.

Geo-Replication: Pulsar’s innovative geo-replication feature, governed by policies, is widely adopted by businesses to meet disaster recovery (DR) requirements. See our blog post: Failover strategies deliver additional resiliency for Apache Pulsar.

Tiered Storage: Pulsar pioneered the use of object storage as tiered storage in the data streaming space. Today, tiered storage is a must-have functionality for data streaming platforms, and Pulsar continues to lead the way. With the introduction of Lakehouse storage in the StreamNative Ursa engine, we are taking the story of tiered storage to new heights.

Reflecting on Pulsar’s development over the years, we are proud of how it has shaped the present of data streaming platforms. Many of the concepts and features introduced by Pulsar have been embraced by peer communities and competitors alike, pushing the entire data streaming ecosystem into the mainstream.

However, our journey of innovation is far from over. With the ongoing development of the Ursa Engine, we are taking Pulsar’s core architecture to new heights, redefining the standards for what an open data streaming platform should be. Ursa introduces several cutting-edge advancements, including addressing the New CAP Theorem, offering flexible deployment options across public and private clouds, and supporting BYOC, Dedicated, and Serverless deployments. It also enables seamless integration with lakehouses through table-stream duality and supports multiple semantics via a range of protocols, from Pulsar to Kafka, MQTT, and beyond.

We believe that Pulsar—and now Ursa—represent the future of data streaming. They are not just tools; they are the foundation upon which the next generation of data streaming platforms will be built.

From Pulsar Summit to Data Streaming Summit: Broadening Horizons

Another key milestone in Pulsar’s journey has been the evolution of the Pulsar Summit. What began as a niche gathering for Pulsar enthusiasts has transformed into the Data Streaming Summit, an upgraded industry event embracing the broader ecosystem of data streaming technologies.

The Data Streaming Summit isn’t just about Pulsar; it’s a platform for exchanging ideas on the latest trends, architectures, and innovations in data streaming. Our goal with these summits is to foster cross-community collaboration, uniting the best minds in open-source, cloud computing, data engineering, and real-time analytics to push the boundaries of what's possible with streaming data.

This transformation from Pulsar Summit to Data Streaming Summit mirrors our broader mission: to democratize data streaming by creating a platform that is open, scalable, and future-proof.

Looking Forward: Pulsar 4.0 and Ursa Engine at the Data Streaming Summit

As we look ahead to the next phase of Apache Pulsar, the upcoming release of Pulsar 4.0 promises to be one of the most significant updates yet. This release will introduce key innovations designed to make Pulsar an attractive open data streaming technology in multi-cloud and hybrid environments. From enhanced storage efficiency to improved latency handling, Pulsar 4.0 will continue to evolve for delivering unified messaging and data streaming platform at scale.

The Pulsar community will continue to grow. Much of that future revolves around the Ursa Engine. At the upcoming Data Streaming Summit, we’ll unveil exciting advancements in Ursa that will redefine what’s possible with data streaming and data lakehouses. Ursa will integrate more deeply with machine learning pipelines, stream processing, and generative AI, powering not only data streams but full-fledged real-time processing capabilities.

We invite everyone to join us at the Data Streaming Summit to explore these exciting developments and celebrate the remarkable achievements of the Apache Pulsar community, alongside the broader data streaming ecosystem. The next chapter in the data streaming journey is just beginning, and we look forward to continuing this adventure with all of you.

Here’s to the future of data streaming and to the community that makes it all possible. Thank you for being part of this incredible journey.

Matteo Meril
Matteo Merli is the CTO at StreamNative. Prior to this role, Matteo held the title of VP, Apache Pulsar at The Apache Software Foundation since June 2015. Matteo was also a Co-Founder at Streamlio before it was acquired by Splunk, where Matteo worked as a Sr. Principal Software Engineer. With a background in computer engineering, Matteo has a wealth of experience in designing and developing large-scale distributed applications and working on projects involving network protocols and data analysis. Matteo's expertise extends to architecting and leading the development of Pulsar, a distributed pub-sub messaging platform.
Sijie Guo
Sijie’s journey with Apache Pulsar began at Yahoo! where he was part of the team working to develop a global messaging platform for the company. He then went to Twitter, where he led the messaging infrastructure group and co-created DistributedLog and Twitter EventBus. In 2017, he co-founded Streamlio, which was acquired by Splunk, and in 2019 he founded StreamNative. He is one of the original creators of Apache Pulsar and Apache BookKeeper, and remains VP of Apache BookKeeper and PMC Member of Apache Pulsar. Sijie lives in the San Francisco Bay Area of California.

Newsletter

Our strategies and tactics delivered right to your inbox

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Apache Pulsar Announcements