Ursa Wins VLDB 2025 Best Industry Paper: The First Lakehouse-Native Streaming Engine for Kafka

By clicking "Accept all cookies" you agree to have cookies stored on your device to improve site navigation, analyze site usage, and assist with our marketing efforts. See our privacy policy for more information.

Blog

September 30, 2025

10 min read

Ursa Everywhere: Paving the Path to a Lakehouse-Native Future for Data Streaming

Matteo Meril

Co-Founder and CTO, StreamNative

Today at the Data Streaming Summit 2025, we are thrilled to announce a major leap in StreamNative’s product evolution. Ursa – our next-generation lakehouse-native data streaming engine – is now being made available across every deployment model. In this announcement, we recap what Ursa is (including its recent accolade as VLDB 2025 Best Industry Paper), revisit the history of the Classic Pulsar Engine versus the new Ursa Engine, and unveil how we’re enabling Ursa’s storage layer as a Lakehouse extension for Classic Engine clusters. This new capability works as a tiered storage extension on all Classic clusters (Serverless, Dedicated, and BYOC), allowing current Pulsar users to start leveraging Ursa’s innovative lakehouse storage today. By doing so, we’re ensuring a smooth upgrade path from the Classic Engine to Ursa Engine in the near future. Our vision is to make Ursa’s stream storage format an open standard for streaming data, benefiting not just StreamNative customers but the broader Apache Pulsar and Kafka communities as well.

Ursa Engine: Kafka Compatibility Meets Lakehouse Innovation

Ursa Engine is our answer to the need for a more cost-effective, cloud-native streaming platform without sacrificing the developer experience that Apache Kafka made popular. In contrast to legacy architectures, Ursa is fully Kafka API-compatible yet fundamentally different under the hood. It is the first “lakehouse-native” streaming engine – built to write data directly to cloud object storage in open table formats (like Apache Iceberg and Delta Lake) instead of persisting to proprietary broker disks. By eliminating the traditional leader-based replication and external ETL connectors, Ursa’s architecture slashes streaming infrastructure costs by up to 10× (roughly 90–95% lower costs) while maintaining seamless compatibility with existing Kafka applications. In other words, users get the same Kafka experience but backed by a modern, cloud-optimized design that decouples compute from storage for elastic scalability. This radical approach delivers high performance with dramatically lower operational overhead, allowing organizations to focus on data and workloads rather than low-level infrastructure.

Ursa’s innovative design has not gone unnoticed. This year, our paper “Ursa: A Lakehouse-Native Data Streaming Engine for Kafka” received the Best Industry Paper award at the prestigious VLDB 2025 conference. The VLDB recognition underscores the significance of Ursa’s leaderless, lakehouse-integrated approach to streaming – validating that Ursa represents a breakthrough in marrying real-time streams with open data lakehouse systems. We’re incredibly honored by this award and energized to continue pushing the state of the art in streaming data technology.

(For those interested in the technical deep-dive, you can read the VLDB 2025 paper for a comprehensive look at Ursa’s design.)

From Classic Pulsar to Ursa: A Tale of Two Engines

To understand the importance of today’s announcement, it helps to look at how the Classic Pulsar Engine and the Ursa Engine differ. The Classic Engine refers to the original Apache Pulsar architecture that StreamNative Cloud has run for years. It relies on Apache ZooKeeper for metadata coordination and Apache BookKeeper for durable, low-latency storage of messages. This compute-and-storage-separation design powers many mission-critical systems by providing ultra-low latency message delivery and strong consistency. Classic Pulsar is also versatile – it supports not only the Pulsar protocol but also Kafka (via Kafka-on-StreamNative) and MQTT, allowing it to speak multiple messaging APIs on a single platform. Today, the Classic Engine remains the default in StreamNative Cloud, available in all deployment modes (Serverless, Dedicated, BYOC) and trusted for workloads that demand the absolute lowest end-to-end latencies.

Ursa Engine was born from the recognition that cloud-era workloads often prioritize cost efficiency and scalability alongside latency. Ursa is built on the Apache Pulsar foundation but reimagines key components for a more flexible and scalable architecture. Instead of ZooKeeper, Ursa uses Oxia – a new scalable metadata store – to manage coordination. Instead of being tied only to BookKeeper for storage, Ursa’s brokers are stateless and leaderless, persisting data directly to cheap and reliable object storage (like AWS S3, GCS, Azure Blob) in open table formats. In short, Ursa shifts from the ZooKeeper-based, disk-centric model of Classic Pulsar toward a “headless” stream storage architecture, using Oxia for metadata and S3/Object Store for durability. This design trades a bit of write latency for massive gains in throughput, cost efficiency, and simplicity.

Key differences between the Classic Engine and Ursa Engine include:

Metadata management: Classic Pulsar uses ZooKeeper for cluster metadata and coordination; Ursa replaces this with Oxia, a horizontally scalable and highly available metadata store. This removes the scaling and maintenance challenges of ZooKeeper in large clusters.
Storage layer: Classic relies on BookKeeper (persistent disks) for storing message data, which offers very low latency. Ursa uses cloud object storage as its primary storage, writing data as files in open formats (Iceberg/Delta) for long-term durability. BookKeeper in Ursa is optional and only used for topics that demand the absolute lowest latency, whereas in Classic it’s the only storage.
Architecture: Classic Engine brokers use a leader-based model for each topic partition (managed by ZooKeeper), and data is replicated broker-to-bookie. Ursa’s brokers are leaderless and stateless – any broker can handle any partition – with replication offloaded to the shared storage layer. This eliminates leader election downtime and cross-datacenter replication traffic, simplifying operations.
Protocols and compatibility: Classic Engine supports Pulsar’s native protocol and Kafka out-of-the-box. Ursa Engine is currently focused on 100% Kafka API compatibility (Pulsar protocol support is on the roadmap). Despite different internals, Ursa presents the Kafka interface so that existing Kafka clients and applications work unchanged. (In StreamNative Cloud, you choose Classic vs Ursa engine when creating a cluster instance, but either way you can connect with Kafka clients.)

These changes make Ursa ideal for cost-sensitive, latency-relaxed workloads in the cloud, whereas Classic Pulsar excels for ultra-low-latency requirements. It’s worth noting that as of today, Ursa Engine has been available in General Availability on AWS (with Public Preview on Azure and GCP). Many of our customers run large-scale Classic clusters and are interested in Ursa’s benefits, but until now, moving from Classic to Ursa meant planning a migration or starting a new cluster. After all, you can’t simply “flip a switch” on a running Pulsar cluster to become an Ursa cluster – Ursa’s use of Oxia and S3 storage is fundamentally different and cannot be retrofitted into an existing Classic cluster without downtime or data migration. This is the challenge we set out to solve: how to bring Ursa’s advantages to existing Pulsar deployments in a seamless way.

Ursa Storage Extension for Classic Pulsar: Lakehouse for All Deployments

Today’s announcement addresses that challenge head-on: we are introducing Ursa Stream Storage as a Lakehouse tiered storage extension for the Classic Engine. In practical terms, this means any Classic Pulsar cluster – including Serverless, Dedicated, and BYOC – can now take advantage of Ursa’s lakehouse-based storage layer without immediately switching to the Ursa Engine brokers. This extension works as a tiered storage plugin for Classic Pulsar clusters, allowing them to offload and store data in the same open table format that Ursa uses. With a configuration change, your Pulsar topics can be automatically persisted to long-term cloud storage (e.g. S3) in Apache Iceberg or Delta Lake format, alongside the traditional BookKeeper storage. Think of it as upgrading the back-end storage of your Classic cluster to speak the “Ursa language” of the lakehouse.

We first previewed this concept last year as “Pulsar’s Lakehouse Tiered Storage”. In late 2023, we showed how Pulsar could adopt open, industry-standard storage formats as a tiered storage layer, instead of using Pulsar’s proprietary segment format for offloading. By integrating with table formats like Delta Lake and Apache Iceberg, that development effectively transformed Apache Pulsar into a Streaming Lakehouse, allowing users to ingest data through Pulsar and have it land directly in their data lakehouse storage. Over the past year, we’ve refined and tested this approach with our users. Now, as a culmination of that work, Ursa Stream Storage is becoming available as a fully supported feature for all Classic Engine clusters – bringing the power of lakehouse tiered storage to every deployment.

What does this mean for Classic Pulsar users? In short, you get the best of both worlds:

Low-latency streaming from BookKeeper for your real-time consumers (ensuring no impact to the snappy performance you rely on for “hot” data), plus
Automatic long-term storage of all data in cost-efficient object storage as Iceberg/Delta tables. This long-term tier is managed by the system – as messages age out from BookKeeper, they’re already safely stored in the lakehouse format, without any external connectors or ETL jobs needed.

Once enabled, the Ursa storage extension continuously converts your Pulsar topic streams into analytics-friendly parquet files in the background (using the same compaction approach pioneered by Ursa Engine). Your streaming data becomes immediately available for batch querying or AI/analytics pipelines via tools like Spark, Trino, or Snowflake – no separate export step required. Essentially, Classic Pulsar clusters can now produce their own lakehouse tables as a byproduct of streaming, aligning with Ursa’s “stream-table duality” design. And because the data is stored in an open format, you maintain full control and portability – the data in S3 is yours to query with any engine, or even to share across different systems.

From an architecture standpoint, this extension leverages Pulsar’s built-in tiered storage mechanism but swaps the storage format to the open table format. There’s no change required in your producers or consumers – Pulsar continues to serve data to them as it always did. Internally, new writing and offloading policies ensure that every message published to Pulsar is durably written to the object storage tier (and compacted into table format) in addition to the BookKeeper ledgers. This means durability and throughput actually increase (object storage can handle very high throughput writes), while BookKeeper handles the tail of the stream for ultra-fast reads.

Crucially, Ursa Stream Storage for Classic clusters is available across all our cloud deployment options. Whether you run on our multi-tenant Serverless offering, have an isolated Dedicated cluster, or deploy in your own cloud (BYOC), you can take advantage of this feature to turn your Pulsar cluster into a hybrid streaming/lakehouse system. By making Ursa’s storage layer ubiquitous, we ensure that every StreamNative customer – not just those who spin up brand new Ursa clusters – can reap the benefits of lakehouse-native streaming.

Paving the Way for Seamless Upgrades from Classic to Ursa

Perhaps the most exciting aspect of offering Ursa’s storage layer on Classic Pulsar is that it paves a clear path to upgrade your streaming engine in the future. Adopting the Ursa storage extension today is essentially future-proofing your Pulsar deployment. Once your data is flowing into the Ursa (lakehouse) storage tier, the hardest part of an Ursa migration is already done! All of your topic history is sitting in object storage as Iceberg/Delta tables, just as the Ursa Engine expects it. This means that when the time is right – for example, when Ursa Engine becomes generally available on your cloud of choice, or when your workload profile shifts to favor Ursa’s strengths – you can swap out the Classic Engine brokers for Ursa Engine brokers without re-ingesting or migrating data. The new Ursa brokers can attach to the existing S3 or Blob storage bucket and immediately take over serving the data from the same unified log/table, picking up exactly where the Classic brokers left off (with full consistency).

In essence, enabling Ursa storage on a Classic cluster is like laying down railroad tracks for an eventual engine swap: you continue to run the Classic locomotive for now, but the tracks (data format) are already compatible with the new high-speed engine when you’re ready to switch. This approach minimizes risk and downtime. You don’t have to maintain two parallel pipelines or perform a big-bang migration of all your historical data. Your producers and consumers can remain connected during the transition, since the Kafka/Pulsar protocols they see don’t change – only the engine behind the scenes does. Our goal is to make moving to Ursa Engine as simple as a rolling upgrade when the time comes.

We understand that many organizations have significant investment in their existing Pulsar clusters (with tailored configurations, client applications, and operational knowledge). With the Ursa storage extension, we’re ensuring those investments continue to pay off. You can incrementally adopt Ursa’s benefits (like cost savings and lakehouse integration) without immediately changing your entire system. Over time, as you gain confidence and as Ursa Engine matures with more features (e.g. Pulsar protocol support, transactions, etc.), you’ll be well-prepared to upgrade your Classic brokers to Ursa brokers. StreamNative will be there to help guide this journey – from sizing the new cluster to orchestrating a cutover – but thanks to this unified storage layer, the journey will be much smoother than a conventional migration.

(As an analogy, consider how cloud databases allow storage to be detached from compute: you can spin up a new compute engine against the same storage. Similarly, Ursa Engine can be “attached” to the storage your Classic Pulsar has been populating, making the upgrade a swap of compute layers rather than a migration of data.)

It’s also worth noting that data governance and catalog integration become easier with this approach. Since your Classic cluster’s data is in Ursa stream storage format, it can be cataloged in systems like Snowflake or Databricks even before you move to Ursa Engine. This brings immediate benefits: for example, you could register your Pulsar topics (now as Iceberg tables) in Databricks Unity Catalog or Snowflake’s Open Catalog, enabling consistent data governance and discovery across streaming and batch worlds. Then, when you transition to Ursa Engine, those integrations remain in place – your data was already in the right format and cataloged under a unified schema. In short, Ursa storage on Classic not only eases the technical migration, but also bridges the gap in how streaming data is used in the broader data ecosystem.

Just as we introduced UniLink for Kafka users (a tool to live-replicate Kafka topics into Ursa Engine with zero downtime) to simplify their path forward, this Lakehouse storage extension serves the Pulsar community’s path to the future. We want every Pulsar user to confidently step into Ursa’s world, at their own pace, and with zero regret.

Towards a Unified Standard for Streaming Data

Beyond just StreamNative or Pulsar, we believe that Ursa’s approach heralds a broader industry shift – one that makes open data formats the backbone of streaming. By leveraging open table formats and cloud object storage as the substrate for streaming data, Ursa effectively turns streaming systems into an integral part of the data lakehouse architecture.

Looking ahead, we anticipate that the lakehouse-native streaming approach can be applied not only in StreamNative’s managed platform, but also in open-source Apache Pulsar and even Apache Kafka environments. The benefits of decoupling storage and compute, and using open formats, are not exclusive to Pulsar or Ursa – they are universal.

Ursa’s availability across every deployment marks a new chapter for StreamNative and our users. Whether you’re a long-time Pulsar user on our Classic Engine or a new user looking for cutting-edge streaming, there is now a clear, incremental path to the future. We invite all our customers to try out the Ursa storage extension on their Classic clusters and start experiencing the benefits of a lakehouse-native streaming architecture. We believe this advancement will not only empower our users with immediate improvements (cost savings, analytics integration, easier migrations), but also accelerate the industry’s move toward more open, unified, and intelligent data streaming systems.

The journey from Classic to Ursa represents more than just an upgrade – it’s the convergence of two worlds (fast streams and durable tables) into one powerful platform. We’re incredibly excited to see what you build with it. Here’s to ushering in the next era of streaming data, together!

This is some text inside of a div block.

Button Text

Matteo Meril

Matteo is the CTO at StreamNative, where he brings rich experience in distributed pub-sub messaging platforms. Matteo was one of the co-creators of Apache Pulsar during his time at Yahoo!. Matteo worked to create a global, distributed messaging system for Yahoo!, which would later become Apache Pulsar. Matteo is the PMC Chair of Apache Pulsar, where he helps to guide the community and ensure the success of the Pulsar project. He is also a PMC member for Apache BookKeeper. Matteo lives in Menlo Park, California.

Our strategies and tactics delivered right to your inbox

Thank you! Your submission has been received!

Oops! Something went wrong while submitting the form.