Ursa Wins VLDB 2025 Best Industry Paper: The First Lakehouse-Native Streaming Engine for Kafka

By clicking "Accept all cookies" you agree to have cookies stored on your device to improve site navigation, analyze site usage, and assist with our marketing efforts. See our privacy policy for more information.

Blog

September 25, 2025

8 min read

Apache Pulsar, Seven Years On: What We Built, What We Learned, What’s Next

Matteo Meril

Co-Founder and CTO, StreamNative

Sijie Guo

Co-Founder and CEO, StreamNative

‍A Vibrant Community Driving Innovation

Seven years ago, Apache Pulsar graduated to become a Top-Level Project at the Apache Software Foundation. In that time, its community has blossomed into one of the most vibrant and innovative in open source. What began as a project incubated at Yahoo has evolved into a global collaboration with hundreds of contributors. By 2025, Pulsar had crossed 700+ contributors on the main repository and amassed over 13,000 commits, alongside 14,000+ GitHub stars and thousands of users on Slack. The momentum only continues to build – the recent Apache Pulsar 4.1 release alone incorporated 560+ community-driven improvements, a testament to the project’s accelerating innovation velocity. As we reflect on this journey, we are humbled by the passionate individuals worldwide who have shared our vision. Each Pull Request, each question answered on Slack, and each community meetup adds to a welcoming, can-do vibe that defines Pulsar. It’s no exaggeration to say the project’s stability, scalability, and security today are direct results of this community-powered effort. We are grateful to every one of you who has been part of Pulsar’s story so far.

Those community efforts have made Pulsar a truly battle-tested technology. Our commitment to open source means that every new feature and fix is driven by real-world needs. Our developer community has hosted Pulsar meetups and summits across continents, sharing knowledge and celebrating successes. Whether it’s late-night discussions on the mailing list or collaborative design of a Pulsar Improvement Proposal (PIP), the energy and openness of this community continue to amaze us. The Apache way – “community over code” – is alive and well in Pulsar. Seven years in, we feel like we’re just getting started.

#1 Production-Proven Distributed Message Queue

Nothing speaks louder than real-world adoption. Pulsar today is the #1 production-proven distributed message queue for many of the world’s most demanding use cases. In the financial services sector, Pulsar has become a critical backbone for high-volume payment systems. For example, Tencent – one of Asia’s largest tech companies – chose Pulsar to redesign its billing platform that processes tens of billions of financial transactions with virtually zero data loss. Handling hundreds of millions of dollars in transactions per day, Tencent’s billing service could not afford downtime or inconsistency. After evaluating many messaging systems, they found Pulsar’s enterprise-grade reliability and scalability to be unmatched – and indeed, after migrating, they run at massive scale “with virtually no data loss”. This kind of confidence is why Pulsar is trusted in payment processing and banking environments where every message (every transaction, trade, or tick) counts.

Even organizations outside traditional finance have benefited from Pulsar’s rock-solid design. Cisco’s IoT Control Center, for instance, replaced a legacy messaging broker with Pulsar to manage 245 million connected devices and 4.5 billion API calls per month across 35,000 enterprise customers. In such a massive IoT deployment – from connected cars to smart city sensors – Cisco needed a system that was, in their own words, “reliable, scalable, and had extremely light overhead. Everything needs to be geo-replicated and secure…”. Pulsar met those requirements, providing the low-latency, geo-replication, and multi-tenancy needed to ensure that “devices should not lose connectivity no matter what.” This is a powerful endorsement: when a Fortune 100 company entrusts Pulsar with critical real-time infrastructure, it validates Pulsar’s production readiness on a grand scale.

The sports betting and online gaming industry is another domain where Pulsar’s strengths shine. In this high-stakes arena, real-time data is the lifeblood – odds and game events must propagate globally in milliseconds. We’ve seen leading betting platforms gravitate to Pulsar for its ultra-low latency and high throughput. Pulsar’s ability to handle millions of events per second with strict ordering, combined with features like geo-replication and partitioned topics, make it ideal for powering live odds feeds and in-game analytics. In sports betting, every millisecond of delay can mean lost revenue or arbitrage – Pulsar’s architecture was built to minimize such delays. While some of these companies prefer to keep a low profile, we can say confidently that Pulsar now underpins real-time betting systems that deliver seamless experiences to users even during the busiest sports events. It’s incredibly exciting to see Pulsar enabling new levels of performance in an industry where real-time is truly real-time.

Modern SaaS platforms have also embraced Pulsar to drive their core business workflows. Two notable examples are Iterable and Attentive – both high-growth marketing tech companies operating at massive scale. Iterable, a customer engagement platform, famously replaced RabbitMQ and even Kafka with Pulsar to unify its messaging backbone. Why? As Iterable’s engineers put it, Pulsar provided the right balance of scalability, reliability, and rich features to consolidate multiple systems into one. Pulsar’s unique combination of streaming and queueing in a single system allowed Iterable to handle billions of events per day for facilitating hyper-personalized real-time marketing and customer engagement. Attentive, an AI-powered marketing platform for leading brands, similarly chose Pulsar as the backbone of its messaging system, ensuring the delivery of billions of messages with exceptional reliability and scale. They leveraged Pulsar’s built-in subscription modes to achieve fine-grained message exclusivity and high fan-out at scale – crucial for their use case of sending personalized messages to millions of consumers. Other SaaS innovators like InnerSpace are using Pulsar to ingest and analyze sensor data in real time (improving workplace safety and operational efficiency). Across these examples, a common theme emerges: Pulsar’s multi-tenancy, horizontal scalability, and durability give companies the confidence to centralize on one messaging platform. They no longer need one system for queues and another for streaming – Pulsar handles both paradigms seamlessly, reducing complexity and operational burden.

Looking across industries, we see Pulsar enabling everything from online banking and payment processing, to ticketing and logistics, to social media and gaming. The breadth of adoption speaks to Pulsar’s flexibility. It can be a high-throughput event stream feeding big data pipelines, and it can act as a persistent queue guaranteeing message delivery for mission-critical workflows – all in the same architecture. Features like tiered storage mean Pulsar can retain data as long as needed (months or years of events) without compromising performance, allowing use cases like auditing and reprocessing. Features like geo-replication and multi-region clustering mean enterprises can deploy Pulsar across data centers and clouds for disaster recovery and data locality, with out-of-the-box support. Simply put, Pulsar today offers the most complete feature set in the messaging space, which is why so many organizations have standardized on it.

Why Pulsar Matters in the AI Era

We’re now living in the era of AI – where real-time data streams fuel intelligent applications and autonomous agents. In this landscape, a robust messaging foundation is more important than ever. Apache Pulsar was born cloud-native and event-driven, so it’s no surprise that many cutting-edge AI platforms have chosen Pulsar as their data backbone. The reason is simple: modern AI workflows often involve orchestrating many microservices, data pipelines, and model outputs in real time. To do this reliably at scale, you need a messaging layer that can handle high throughput, guarantee delivery, enforce schemas, and scale horizontally – exactly Pulsar’s strengths.

Take Tencent’s Angel PowerFL (Federated Learning) platform as an example. This distributed machine learning system at Tencent had stringent requirements for stability, low latency, and data privacy across trillions of training tasks. After benchmarking different solutions, the team adopted Pulsar for the federated data synchronization, concluding that Pulsar provided the stability, reliability, and scalability their ML platform required. In production, Pulsar has lived up to the task, ensuring that model updates and gradients are streamed efficiently and securely between participants in the federated learning network. When an AI system is coordinating learning across banks or hospitals (where data can’t be centralized), Pulsar’s multi-tenant and geo-replicated design becomes a critical enabler – it allows data scientists to focus on models, knowing the data movement “just works.”

Another great example is TrustGraph, an open-source AI platform for building knowledge graphs and LLM-powered agents. TrustGraph’s architecture is built from the ground up on Pulsar’s publish-subscribe model. Why? Because they needed a backbone that ensures real-time processing, fault tolerance, and parallel workflows as data flows through their pipeline of extractors, transformers, and AI agents. The TrustGraph founders, coming from enterprise AI backgrounds, deliberately chose Pulsar to overcome the reliability and scaling limitations they saw in other frameworks. Pulsar’s ability to handle streaming data and event-driven triggers means TrustGraph can chunk and analyze huge unstructured datasets (like entire law libraries or aerospace manuals) with a network of cooperating AI agents – all without breaking the flow of data. In short, Pulsar is the “glue” that holds together the complex moving parts of an AI system, from ingestion to inference.

We’ve also seen AI startups leveraging Pulsar to do things that simply weren’t possible with legacy queues or log systems. A company like Unify – which built an AI-driven go-to-market platform – is a great case in point. Backed by the OpenAI Startup Fund, Unify set out to deliver instant AI insights on streaming customer events. Early on, they realized that a patchwork of cron jobs and Amazon SQS queues wouldn’t scale or meet their latency goals. They turned to Pulsar (via StreamNative Cloud) to handle tens of millions of events per day in real time, powering an AI that scores leads and triggers workflows in seconds. Pulsar allowed them to consolidate what would have been multiple subsystems – message queuing, pub/sub, event storage, scheduling – into one simple platform. With features like message replay, delayed delivery, and topic compaction, Unify’s small engineering team achieved capabilities that rival those of much larger organizations. They can reprocess historical events to improve their ML models, schedule automated follow-ups without external schedulers, and guarantee that no data is lost even if an AI consumer goes down. As Unify’s founding ML engineer put it, Pulsar gave them “peace of mind” to deploy new AI features without worrying about missing events. This agility is priceless in the fast-moving AI domain.

Crucially, Pulsar’s design principles align with the needs of AI systems. Strict message ordering and backpressure management ensure that event streams remain consistent – so an AI’s decisions based on those events remain correct. Builtin Schema Registry support means producers and consumers can evolve data formats in a controlled way, and Pulsar will reject incompatible producers – preventing bad data from silently corrupting an ML pipeline. In fact, imagine an AI application trying to consume messages from a topic and suddenly encountering an unexpected schema change that breaks its parser. In Pulsar, that scenario is avoidable by design: you can enforce schemas at the topic level, something not possible in Kafka without external add-ons. Similarly, Pulsar’s Dead Letter Queue (DLQ) and Negative Acknowledgment features are a godsend for AI workflows. If an AI microservice fails to process certain events (perhaps an image is too large, or a model isn’t available), Pulsar can automatically route those events to a DLQ for later inspection or reprocessing. This kind of resiliency ensures that one hiccup in an AI pipeline doesn’t require shutting everything down – the show goes on, and engineers can address the outliers after. As AI applications mature, these operational safeguards separate the toy projects from the production-grade platforms.

Pulsar is even proving its value in cutting-edge areas like real-time computer vision and edge AI. Safari AI is a startup that helps enterprises monitor physical operations (think real-time occupancy counting, queue detection, etc.) using their existing security cameras and ML models. As they scaled to managing 10,000+ video streams from 50,000+ cameras, Safari AI found that Kafka and Kinesis were not cost-effective or agile enough. They migrated to Pulsar via StreamNative and saw a 50% reduction in cloud costs while easily supporting their complex ML data pipelines. In the words of Safari’s co-founder, “StreamNative’s resilience is critical to our SaaS operations… the best choice – cutting our costs by more than 50% while seamlessly supporting our ML data structure requirements.” With Pulsar’s tiered storage and schema management, Safari AI was able to retain a year’s worth of video event data for analysis, maintain sub-10s end-to-end latency in delivering metrics, and do it all without a large DevOps team. This story encapsulates why message queues like Pulsar are vital in the AI era: they let companies focus on building intelligent features rather than reinventing streaming infrastructure. As AI continues to proliferate – from real-time fraud detection, to autonomous vehicles, to personalized content feeds – we believe Apache Pulsar will be the go-to nerve system that connects data to intelligence reliably at scale.

Built for the Future: Bring Pulsar’s philosophy to Kafka via Ursa Engine

From the beginning, Apache Pulsar was designed to address the shortcomings we saw in earlier messaging systems like Kafka. Many of the features that Pulsar pioneered over the last 7 years have only become more relevant with time. We sometimes like to play a thought experiment: What if Apache Kafka had originally been designed with some of Pulsar’s core features? For instance, imagine if Kafka had clear multi-tenancy and isolation through namespaces by default – how much easier would self-service streaming be in large organizations! Imagine if Kafka could validate schemas at the broker and reject producers sending invalid data, preventing nasty surprises downstream. What if Kafka had built-in Dead Letter Queues and negative acknowledgments, allowing applications to handle failures gracefully without external tools? Or if it could run lightweight serverless functions directly on the cluster, enabling simple event transformations and routing on the fly? Some of you know that these aspects – multi-tenancy, strong schema enforcement, developer-friendly features – are very dear to us. These aren’t just “nice-to-haves” – they solve real pains for large organizations that use streaming at scale. The good news is that all of these “what if” features already exist today. They exist in Apache Pulsar. In many ways, Pulsar has been ahead of the curve, integrating capabilities that developers ended up needing as their deployments grew.

It’s gratifying to see the broader ecosystem acknowledge these innovations. We’ve watched over the years as Kafka users and cloud vendors bolted on solutions for some of these problems (Kafka “schema registry” servers, Kafka Streams and Connect, kludgy multi-tenant clusters, etc.), confirming that the problems Pulsar set out to solve were very much real. Pulsar’s holistic approach – a multi-layer architecture separating compute and storage, built-in geo-replication, first-class multi-tenancy – was the result of lessons learned operating global-scale messaging at Yahoo. That DNA of innovation continues to guide Pulsar’s evolution. The recent Pulsar 4.1 release is evidence: enhancements in 4.x have improved reliability, performance and operability. It’s no wonder the Pulsar community can implement over 560 improvements in one release cycle – we are moving quickly to keep Pulsar the most advanced platform of its kind.

Yet, we also recognize that not everyone is on Pulsar (yet!). There are many existing applications and data platforms built on Kafka that, for various reasons, cannot migrate easily. As enthusiasts of streaming tech, we want to see the benefits of Pulsar’s innovations shared as widely as possible, even by those who haven’t made the switch. That philosophy led us to our next big project: the Ursa Engine. Ursa is our effort to bring the core ideas of Pulsar – its architecture and lessons – to the Kafka ecosystem. We sometimes describe Ursa as “Pulsar’s technology applied to Kafka’s API”, though in truth it’s more than that. Under the hood, Ursa is a brand new streaming engine that combines a leaderless architecture with a lakehouse-centric storage model. In practical terms, it means Ursa can serve Kafka topics with Pulsar-like efficiency and scalability. It decouples the compute and storage like Pulsar does, using object storage and a lakehouse format (Apache Iceberg/Delta) for message persistence. This eliminates a lot of the operational pain that Kafka clusters traditionally face around data retention and cluster rebalance. With Ursa, we can achieve cost-effective, high-throughput streaming without the overhead of maintaining multiple replicas of data on local disks – instead, data is persisted once to durable storage, and brokers are stateless processing nodes. This leaderless design also avoids the fragility of a single leader per partition; no more controller elections or hot partitions as in Kafka’s world. In short, Ursa takes the scalability of Pulsar’s architecture and makes it available to Kafka users, so they can grow beyond the limits of the old Kafka design.

We’re incredibly excited about Ursa, not only for what it does for Kafka compatibility, but also for what it means for the data streaming ecosystem. Ursa is fully compatible with Pulsar as well – it’s an engine that can speak multiple protocols (Kafka, Pulsar, etc.) on top of a next-gen storage layer. This is why we say Pulsar’s philosophy continues at the heart of Ursa: we’re effectively bringing Pulsar’s ideas into a form that can be adopted by the Kafka community, bridging two ecosystems for the benefit of all. The early results have been very promising. In fact, our Ursa Engine research was recognized with the VLDB 2025 Best Industry Paper award, highlighting Ursa as the first “lakehouse-native streaming engine for Kafka.” Our vision is that in the coming years, whether you come from the Pulsar world or the Kafka world, you’ll have access to a unified data storage foundation that combines the best of both. Pulsar will continue to thrive and evolve (with a 5.0 LTS on the horizon and more novel features in development), and Kafka-based users will also be able to enjoy those advancements through the Ursa-powered storage foundation. It truly feels like we’re entering a new chapter where the lines between “Kafka or Pulsar” fade away, and the focus shifts to capabilities and outcomes. We want to make streaming data easier, more affordable, and more powerful for everyone.

Conclusion: A Personal Thank-You and Onward to the Future

As we celebrate Apache Pulsar’s seven-year anniversary, we – Sijie and Matteo – want to take a moment to reflect on the journey with gratitude. When we started building Pulsar, we imagined a system that could serve as the unified messaging fabric for cloud applications; we believed in a design that challenged the status quo and put developers first. Seeing that vision validated – by a vibrant community and by adoption at some of the world’s top companies – is deeply rewarding on a personal level. More than anything, we are thankful to the Pulsar and broader data streaming community: every user, every contributor, every champion who advocated for Pulsar in their organization. You have made Pulsar not just a technology, but a movement. The energy and optimism we feel from this community keeps us motivated every single day.

The vibe around Pulsar has always been one of innovation and inclusivity. It’s not just about writing code; it’s about helping each other succeed with event-driven architectures, it’s about welcoming newcomers on Slack, it’s about continuing to push the boundaries of what a messaging system can do. To the many organizations that put their trust in Pulsar, we thank you for your confidence – your success stories are our proudest achievements. Knowing that Pulsar helped cut costs in half for a startup, or ensured zero data loss in a bank, or delivered instant experiences in a mobile app – that’s what this is all about.

Looking ahead, we are more excited than ever. The next wave of challenges – agentic workloads, global-scale data sharing, fully autonomous systems – are exactly the kinds of challenges Pulsar is built to handle. With the community’s help, Pulsar will continue to evolve rapidly. Features like unified stream/table storage (via Ursa), deeper serverless function integration, and even more ecosystem connectors are on the horizon. We also remain committed to making Pulsar easy to adopt: from improving documentation and onboarding, to offering managed services and training, we want to ensure anyone who can benefit from Pulsar has a smooth path to do so.

In closing, we want to encourage everyone reading this: if you’re already part of the Pulsar community, thank you for an amazing seven years – let’s raise a toast to how far we’ve come. If you’re new to Pulsar or considering it, come join us! There’s never been a better time to get involved, whether by trying out Pulsar 4.1, contributing to a GitHub issue, or attending an upcoming Data Streaming Summit. We co-founders remain as approachable as ever – find us on Slack, at conferences, or via the Pulsar or StreamNative community channels – we love hearing your feedback and ideas. Apache Pulsar’s journey from an incubating project to a world-class messaging and streaming platform has been a thrilling ride, and it’s still early days. With this community and our relentless drive to innovate, we’re confident the best is yet to come. Here’s to the next seven years and beyond – onwards and upwards with Pulsar!

This is some text inside of a div block.

Button Text

Matteo Meril

Matteo is the CTO at StreamNative, where he brings rich experience in distributed pub-sub messaging platforms. Matteo was one of the co-creators of Apache Pulsar during his time at Yahoo!. Matteo worked to create a global, distributed messaging system for Yahoo!, which would later become Apache Pulsar. Matteo is the PMC Chair of Apache Pulsar, where he helps to guide the community and ensure the success of the Pulsar project. He is also a PMC member for Apache BookKeeper. Matteo lives in Menlo Park, California.

Sijie Guo

Sijie’s journey with Apache Pulsar began at Yahoo! where he was part of the team working to develop a global messaging platform for the company. He then went to Twitter, where he led the messaging infrastructure group and co-created DistributedLog and Twitter EventBus. In 2017, he co-founded Streamlio, which was acquired by Splunk, and in 2019 he founded StreamNative. He is one of the original creators of Apache Pulsar and Apache BookKeeper, and remains VP of Apache BookKeeper and PMC Member of Apache Pulsar. Sijie lives in the San Francisco Bay Area of California.

Our strategies and tactics delivered right to your inbox

Thank you! Your submission has been received!

Oops! Something went wrong while submitting the form.