Streaming Lakehouse | Convert Kafka topics to Iceberg and Delta tables

Ursa Wins VLDB 2025 Best Industry Paper: The First Lakehouse-Native Streaming Engine for Kafka

By clicking "Accept all cookies" you agree to have cookies stored on your device to improve site navigation, analyze site usage, and assist with our marketing efforts. See our privacy policy for more information.

Deny Accept

Streaming Lakehouse

Real‑time streams. Open tables. One system.

Write streaming data directly into open table formats (Iceberg/Delta) and query it in seconds—without Kafka/Pulsar connector pipelines or dual systems. Drop‑in Kafka/Pulsar compatibility included.

Get Started Free Learn More

Core Benefits

ingest‑to‑consume

Immediacy

Sub-second ingest-to-consume freshness—events become visible within ~1s, enabling fraud detection, recommendations, and operational monitoring to run on fresh data.

lower cost

Architectural simplicity

Eliminate Kafka-to-Lakehouse ETL, offset juggling, and recovery playbooks. One system powers streams and tables with unified metadata, data storage, and governance.

copy of data

Cost efficiency

Streams land in object storage as Iceberg/Delta tables—no broker disks, duplicate copies, or inter-AZ replication—so compute scales independently and costs drop by up to 95%.

Why Streaming Lakehouse

End the "two‑systems" tax

Traditional stacks run Kafka/Pulsar beside a Lakehouse and sync via connectors. That doubles operations, creates stale data, and breaks "single source of truth". Streaming Lakehouse folds streaming into the Lakehouse so new data is immediately queryable as a table—no staging topics, no micro‑batches, no fragile connectors

Stream‑table duality

Each event is both an ordered stream record and a row in an Iceberg/Delta table—one copy, two views. Low-latency consumers read the stream while SQL engines query the same bytes immediately, with consistent offsets, schema, debugging, and replay.

Anatomy of a Streaming Lakehouse

A three-layer model: Data · Metadata · Protocol

Learn more →

Data layer

Stream format (WAL & Open Tables)

Data layer

Stream format (WAL → Parquet, open tables)

Events land durably in a write-ahead log and compact into Parquet with atomic catalog updates. Query engines see fresh + historical data through a union read path. Choose latency- or cost-optimized mode per stream.

Learn more →

Metadata layer

Stream catalog (Streaming index + Unified Governance)

Metadata layer

Stream catalog (offset index + governance)

A streaming-aware catalog tracks schemas and a streaming offset index that maps offsets to WAL/Parquet files. That enables high performance ingestion and unified governance across streams and tables.

Learn more →

Protocol layer

Streaming API (Stateless & Multi-protocol)

Protocol layer

Streaming API (stateless, multi-protocol)

Stateless services speak Kafka or Pulsar protocols and translate client calls to storage operations. Because brokers are stateless, you scale compute and storage independently, add capacity in seconds, and keep drop-in client compatibility.

Learn more →

StreamNative Ursa is the reference implementation of the Streaming Lakehouse blueprint—pairing leaderless, stateless brokers with an object-store WAL to turn Kafka streams into Iceberg/Delta tables with high performance ingestion, elastic scale, predictable latency, and lower cost.

Read the Ursa paper →

Leaderless & Diskless

Ursa decouples compute from storage—brokers are leaderless and hold no local disks, eliminating elections, rebalancing, and hot partitions. Failover is instant, scale is elastic, and durability comes from a shared WAL plus object storage for predictable latency and lower cost.

Stream-as-table storage

Events land in a durable write-ahead log and compact into Parquet with atomic catalog commits. A range-based offset index keeps streams and tables in lockstep, enabling exactly-once ingestion, time travel, and a single copy of data in Iceberg/Delta.

Kafka-compatible, flexible ingest

Drop in with existing Kafka clients—idempotent producers, transactions, consumer groups—while stateless brokers translate protocol calls to the storage engine. Choose latency-optimized WAL or cost-optimized direct object-store writes to meet each workload’s SLOs and budget.

Compare Streaming Architectures

How Streaming Lakehouse stacks up against Kafka → ETL → Lakehouse, Kafka tiered storage, and streaming databases across data copies, freshness, compatibility, analytics, and ops.

Feature

Streaming Lakehouse

Kafka → ETL → Lakehouse

Kafka Tiered Storage

Streaming DBs

Data copies

1 (open table)

Multiple

1 (proprietary log)

2 (replicate to tables)

Freshness to query

Seconds

Minutes–hours

Not table‑native

Varies

Ingestion Protocol

Kafka or Pulsar

Kafka only

Custom

Query engine

Any SQL on Iceberg/Delta

SQL after ETL

Not columnar

Vendor‑specific

Ops

Single system

Two systems + connectors

Kafka ops + object store

New system + replication

Why Streaming Augmented Lakehouse (SAL) wins

One system, one copy. SAL writes streams directly to Iceberg/Delta as Parquet and keeps a unified catalog + streaming offset index, so the same bytes serve streams and tables. No connectors, less to break, lower cost, and immediate analytics.

Related resources

September 25, 2025

10 min read

Ursa Everywhere: Paving the Path to a Lakehouse-Native Future for Data Streaming

Advanced Pulsar

Ursa

Lakehouse-Native

September 2, 2025

6 min read

Ursa Wins VLDB 2025 Best Industry Paper: The First Lakehouse-Native Streaming Engine for Kafka

February 2, 2025

30 min

Cut Kafka Costs by 95%: The Power of Leaderless Architecture and Lakehouse Storage

More to explore

On-demand webinar: Streaming Lakehouse

A quick walkthrough of the three-layer architecture, real-world use cases, and live demos.

Watch now →

Blog series: De-composing Streaming Systems

Why streams need their Iceberg moment—data, metadata, and protocol explained with diagrams.

Read the series →

Talk to an expert

Get 1:1 guidance on architecture, migration paths, and POC scoping for your environment.

FAQ

Don’t see an answer to your question? Check our

docs

, or

directly.

Can I use my existing Kafka clients?

Yes. StreamNative Ursa is Kafka-protocol compatible, so you can migrate apps without code changes.

Do I still need Kafka Connect or Pulsar IO?

Not to land streams in open tables—Ursa writes directly to Delta/Iceberg. Use Kafka Connect or Pulsar IO when integrating with other external systems.

Which table formats do you support?

Delta Lake and Apache Iceberg.

What latency can I expect?

Classic Engine uses low-latency BookKeeper storage for latency-sensitive workloads; Ursa’s cost-optimized S3 WAL targets sub-second writes, typically ~200–500 ms, trading a bit of latency for major cost savings.

How do you handle failover and scaling?

Ursa’s brokers are leaderless/stateless—any broker can serve produce/fetch—reducing inter-AZ replication and improving availability.

Build your Streaming Lakehouse.

Unify streams and tables on open formats. Ship real‑time products faster—at a fraction of the cost.

Talk to an expert Start now