Ursa Wins VLDB 2025 Best Industry Paper: The First Lakehouse-Native Streaming Engine for Kafka

By clicking "Accept all cookies" you agree to have cookies stored on your device to improve site navigation, analyze site usage, and assist with our marketing efforts. See our privacy policy for more information.

Blog

March 17, 2025

15 min

Announcing Ursa Engine GA on AWS: Leaderless, Lakehouse-Native Data Streaming That Slashes Kafka Costs by 95%

Sijie Guo

Co-Founder and CEO, StreamNative

Text Link

No items found.

We’re excited to announce a major milestone in the evolution of cloud-native data streaming: Ursa Engine is now Generally Available on StreamNative BYOC for AWS! Built to fulfill the promise of the Streaming Augmented Lakehouse, Ursa Engine is the first and only Kafka-compatible data streaming engine purpose-built for AI-Ready data lakehouses in cloud-native environments. It streamlines data streaming into your lakehouse, augmenting it with real-time streaming capabilities and slashing infrastructure costs by up to 95% compared to traditional Kafka deployments.

In tandem with our GA release, we’re proud to share that Ursa Engine now natively integrates data across various locations such as tables stored in Amazon S3 Tables and tables registered in Databricks Unity Catalog, Snowflake Open Catalog, and — providing organizations with end-to-end data governance for both streaming and batch workloads.

The Streaming Augmented Lakehouse: Why It Matters

Traditional data ecosystems often require multiple, separate infrastructures: one for real-time data streaming (e.g., Kafka or Pulsar) and another for batch processing via data lakehouses (e.g., Delta Lake, Iceberg). This split environment not only complicates governance, schema management, and data discovery—it also introduces expensive infrastructure costs resulting from repeated data transfers and storage, complex ETL processes, and error-prone, duplicated schema mapping. Specifically, organizations face:

Costly Data Transfers: Frequent cross-system data movement drives up infrastructure expenses.
Fragmented Governance: Duplicating access policies, security settings, and lineage tracking across multiple platforms leads to inconsistencies.
Operational Complexity: Running two or more separate systems for data streaming and lakehouses is labor-intensive.
Data Silos: Maintaining consistent data sets across streaming, warehouse, and lakehouse environments is resource-heavy and prone to errors.

Ursa Engine solves these challenges by augmenting the lakehouse with Kafka-compatible data streaming capabilities, leveraging open storage formats like Delta Lake and Iceberg, and unifying governance through catalog integrations. The result: real-time AI and analytics without the overhead of siloed data pipelines or expensive multi-system architectures.

General Availability on StreamNative BYOC for AWS

Ursa Engine is now officially GA on StreamNative BYOC (Bring Your Own Cloud) for AWS, giving organizations the freedom to deploy Ursa in their own cloud environment—while offering a fully integrated approach to streaming data into lakehouses. Key benefits include:

10x Infrastructure Cost Reduction (Up to 95% Savings)
Ursa’s leaderless architecture eliminates inter-AZ data transfer overhead and leverages lakehouse-native storage, driving down costs significantly. Read our cost benchmark report to see how Ursa sustains a 5GB/s Kafka workload at just 5% of the cost of traditional streaming engines like Kafka and Redpanda.
Kafka Protocol Compatibility
Retain your existing Kafka clients and applications without rewriting code.
Latency-Relaxed Workloads
Strike the ideal balance between throughput, performance, and cost-effectiveness, especially for AI & analytics scenarios that don’t require single-digital millisecond latencies.
Instant Lakehouse Availability
Make data instantly accessible in open-standard formats (e.g., Iceberg, Delta) by leveraging native lakehouse integration, removing extra ETL processes and data movement.
Unified Governance
Maintain consistent security, lineage, and access policies, along with seamless discovery through native integration with Unity Catalog and Iceberg REST Catalog —unifying data access across both real-time and batch domains.
Usage-Based Pricing
Leverage Elastic Throughput Units (ETUs) to pay only for throughput, significantly lowering total cost of ownership compared to traditional streaming platforms.

“As a longtime StreamNative customer, I couldn’t be more excited about the new Ursa Engine GA. Our evaluation shows it to be 10x more cost-efficient than other Kafka solutions. Everything is seamlessly written to object storage, automatically compacted into Iceberg tables, and made immediately available for our data teams using Snowflake.” —Christos A, Enterprise Architect at a Fortune 500 company

By adopting Ursa Engine on StreamNative BYOC, customers can consolidate their data infrastructure—reducing both costs and complexity—while unifying streaming and batch processing into one cohesive ecosystem.

Reduce Infrastructure Costs by 10x with Leaderless Architecture and Lakehouse-Native Storage

A key differentiator of Ursa Engine is its leaderless architecture, which leverages the lakehouse as shared storage and Oxia as a scalable index/metadata manager. This approach eliminates expensive inter-AZ traffic and significantly reduces inter-AZ data replication overhead. In a recent benchmark, Ursa consistently handled 5GB/s of Kafka workload for just $54 per hour—95% cheaper than vanilla Kafka and RedPanda.

In addition, Ursa Engine is the first and ONLY data streaming solution that natively implements its storage engine using open lakehouse formats, supporting both Iceberg and DeltaLake. By embedding data schemas directly into the storage layer, Ursa takes advantage of columnar compression, enabling potential 10x or more storage reduction.

Unlike other “Iceberg integrations” (e.g., RedPanda Iceberg topics), where two copies of data are maintained—one in proprietary storage and another in the lakehouse—Ursa stores data just once, cutting complexity and eliminating inconsistencies.

By embracing open lakehouse formats and avoiding leader-based inter-zone data replication, Ursa delivers up to a 10x reduction in infrastructure costs compared to traditional streaming solutions.

Interested in how we achieved these savings? Check out our blog post on ”Why Leaderless Architecture and Lakehouse-Native Storage for Reducing Kafka Cost”.

Unified Governance with Unity Catalog, Snowflake Open Catalog & AWS S3 Tables

Another major differentiator is that Ursa Engine natively integrates with popular data catalogs that support Iceberg and/or Delta Lake, bringing real-time streaming and batch data together under a single governance model through native catalog support. Specifically, Ursa Engine connects with:

Databricks Unity Catalog – Delivering uniform access controls and lineage across streaming and batch data, eliminating the need to maintain multiple parallel security configurations.
Snowflake Open Catalog – Allowing organizations to discover and govern real-time data—stored in open table formats like Iceberg—alongside Snowflake’s analytical workloads.
AWS S3 Tables – Ursa Engine can stream data directly into Amazon S3 Tables, leveraging Iceberg’s REST catalog to ensure centralized metadata, efficient storage optimization, and seamless querying via AWS analytics services.

By registering your Kafka topics as managed or external tables in these catalogs, you achieve:

Centralized Policies & Access Control
Define and apply consistent security, lineage, and compliance rules once, instead of duplicating them across multiple systems.
Schema & Metadata Discovery
A single “source of truth” for data definitions in both real-time streaming and batch environments, boosting data reliability and usability.
Reduced Data Silos
Break down barriers between streaming and analytics teams; everyone has a unified view of the data, enabling faster insights and easier collaboration.
Open Standard Formats
Since Ursa Engine writes data in Iceberg or Delta Lake by default, any compatible downstream engine—Databricks, Snowflake, AWS Athena, and more—can instantly query your latest streaming data.

With this native data catalog integration, Ursa achieves storing a single copy of your data in your own bucket -fully discoverable and shareable—then seamlessly provide access across Databricks, Snowflake, AWS Athena, and more. No more juggling siloed data copies or ballooning transport costs. It turns “separate worlds” of streaming and batch data into a single ecosystem, minimizing complexity while maximizing governance, security, and discoverability.

ETU Pricing Model: Pay for Throughput, Not Storage

Lastly, while traditional streaming platforms often bundle storage and throughput costs, Ursa Engine introduces Elastic Throughput Units (ETUs)—a usage-based pricing model that charges only for throughput, with no storage fees.

Transparent & Predictable: Scale your workload as needed without hidden storage charges.
50% Lower Cost than Confluent WarpStream: Lower your total cost of ownership (TCO) while maintaining robust performance and reliability. Check out the pricing difference in this blog post.

Getting Started with Ursa Engine

Ready to take your data architecture into the era of real-time AI? Here’s how you can get started:

🚀 [Sign Up for Ursa Engine on StreamNative BYOC]
Deploy in your preferred cloud environment, configure latency-relaxed Kafka workloads, and streamline data ingestion into your lakehouse.

📖 [Explore Our Documentation]
Learn how to configure Ursa Engine with Databricks Unity Catalog, Snowflake Open Catalog, and/or AWS S3 Tables to maintain a single governance model from ingestion to analytics.

📞 [Contact Us for a Demo]
See how Ursa Engine optimizes Kafka workloads and simplifies lakehouse integration—reducing complexity and operational overhead.

🎥 [Watch our on-demand workshop]
Augment Your Lakehouse with Streaming Capabilities for Real-Time AI to get an end-to-end overview of StreamNative’s integration with Databricks Unity Catalog.

Thank you for joining us on this journey to redefine real-time data streaming standards. With the General Availability of Ursa Engine on BYOC for AWS, complete the integrations with Unity Catalog, Snowflake Open Catalog and AWS S3 Tables, you can unify governance, cut costs, and streamline your data ingestion—all in one place.

We look forward to seeing the innovative applications and solutions you’ll build with Ursa Engine!

This is some text inside of a div block.

Button Text

Sijie Guo

Sijie’s journey with Apache Pulsar began at Yahoo! where he was part of the team working to develop a global messaging platform for the company. He then went to Twitter, where he led the messaging infrastructure group and co-created DistributedLog and Twitter EventBus. In 2017, he co-founded Streamlio, which was acquired by Splunk, and in 2019 he founded StreamNative. He is one of the original creators of Apache Pulsar and Apache BookKeeper, and remains VP of Apache BookKeeper and PMC Member of Apache Pulsar. Sijie lives in the San Francisco Bay Area of California.

Our strategies and tactics delivered right to your inbox

Thank you! Your submission has been received!

Oops! Something went wrong while submitting the form.