Ursa Wins VLDB 2025 Best Industry Paper: The First Lakehouse-Native Streaming Engine for Kafka

By clicking "Accept all cookies" you agree to have cookies stored on your device to improve site navigation, analyze site usage, and assist with our marketing efforts. See our privacy policy for more information.

Blog

September 30, 2025

8 min read

Q3 2025 Data Streaming Launch: Lakehouse Streaming, Governed Analytics, and Event-Driven Agents

Kundan Vyas

Staff Product Manager, StreamNative

Modern data teams face a three-stage bottleneck: data streaming is getting more expensive to run at scale, insights stall when streaming and analytics live in separate systems, and actions are delayed because AI can’t reliably operate on live context. This quarter’s Data Streaming launch tackles that end-to-end path with one coherent story: stream once, store in the open, govern centrally, make analytics immediately queryable, and operationalize real-time agents that act with confidence.

We’re announcing four upgrades that fit together as a single architectural arc. Ursa’s lakehouse storage becomes available across every Classic Engine cluster so you can adopt lakehouse economics without a disruptive migration. StreamNative Cloud’s Unity Catalog integration expands with managed Apache Iceberg tables, turning event streams into governed, query-ready Iceberg tables in Databricks. RBAC reaches General Availability in the StreamNative Cloud, bringing least-privilege access control to multi-tenant streaming. And Orca—our event-driven Agent Engine—enters Private Preview, giving enterprises an event-driven runtime where autonomous agents live on the same backbone as your data. Taken together, these updates lower total cost of ownership, collapse data silos, and make real-time AI practical in production.

Ursa Everywhere: the lakehouse-native path to data streaming—now for every Classic cluster

When we introduced Ursa, we set out to deliver a streaming engine that preserves the Kafka developer experience while fundamentally rethinking the storage and replication economics underneath. Ursa writes streams directly to cloud object storage in open table formats—think Apache Iceberg or Delta—rather than replicating messages across broker disks and exporting them later via external connectors. By eliminating leader-based broker replication and the “second pipeline” required to feed your data lake, Ursa’s architecture can reduce infrastructure costs by an order of magnitude while decoupling compute from storage for elastic scaling. That design has now moved from paper to practice; our VLDB 2025 Best Industry Paper recognition validates the approach and its impact at scale.

Today we’re taking the next step: Ursa’s storage layer is available as a lakehouse tier for all Classic Engine clusters—Serverless, Dedicated, and BYOC. You keep Pulsar’s ultra-low-latency hot path in BookKeeper for operational workloads, and you continuously persist history into Iceberg/Delta on S3, GCS, or Azure Blob as part of the same write. No extra ETL job, no duplicate pipeline, and no change for your producers or consumers. It’s the Classic engine you trust, with the lakehouse durability and open format your analytics estate expects.

This is more than offload. By standardizing on the Ursa streaming storage format beneath your Classic clusters, you’re laying the tracks for a future effortless upgrade. In the future, when your workload profile or cost targets point to Ursa brokers, you attach the new engine to the same object storage and take over serving from day one—no re-ingest, no backfill, no big-bang cutover. The protocols your apps see (Pulsar or Kafka) don’t change; only the engine does. It’s a clean swap of compute against a shared storage substrate that’s already been populated by your Classic clusters.

If your priority is TCO and lakehouse integration, this is the most pragmatic route to the lakehouse ecosystem today and a seamless Ursa upgrade tomorrow. Enable the Ursa storage tier on your Classic clusters, observe your data flow into your lakehouse, and enjoy the seamless analytics-ready experience.

Get started: turn on Lakehouse Storage for a Classic cluster in StreamNative Cloud and pick your object store and table format. If you’d like architectural guidance or a cost-reduction analysis, our team can help you model the impact.

You can read the detailed announcement in this blog post.

Streaming that arrives governed: Unity Catalog with managed Iceberg tables

Streaming is at its best when it doesn’t fork your architecture. With Ursa, events are written into columnar Parquet files and committed to Iceberg tables; with Unity Catalog, those tables are immediately discoverable, governed, and queryable in the same catalog where the rest of your lakehouse lives. That means real-time data becomes a first-class citizen of your analytics estate the moment it lands.

In practice, this looks simple. You stream to Pulsar or Kafka. Ursa writes and compacts to Iceberg, publishing new snapshots as files arrive. Unity Catalog registers those tables and enforces access controls and lineage in line with your enterprise policies. Your analysts and data scientists reach for the same SQL endpoints and notebooks they use today, and they see fresh, governed streaming tables without bespoke glue code or separate pipelines.

The payoff is speed and simplicity. BI dashboards and ML features no longer lag behind your operational reality. Governance does not regress the moment a pipeline becomes “real-time”. And because the tables are open Iceberg underneath, you keep maximum interoperability with Spark, Trino, Flink, and Snowflake, even as Unity Catalog provides a single place to manage discovery and permissions.

Try it now: connect your Ursa or Classic cluster to Databricks Unity Catalog, choose managed Iceberg, and publish a streaming topic as a table. You’ll go from events to governed SQL in minutes, not weeks.

You can read the detailed announcement in this blog post.

RBAC GA: least-privilege access for multi-tenant streaming

As organizations consolidate more teams and workloads onto a shared streaming backbone, centralized access control moves from nice-to-have to mandatory. Role-Based Access Control (RBAC) in StreamNative Cloud is now Generally Available, bringing a consistent security model to Pulsar- and Kafka-compatible endpoints.

RBAC gives platform owners a single place to define who can create tenants and namespaces, who can publish or subscribe to which topics, who can evolve schemas, and how service accounts should be scoped. Roles can reflect the way your org actually works—a platform team with broad administrative rights, domain teams with namespace-level controls, application services with topic-specific produce or consume permissions—and those roles can be applied across clusters without brittle, per-cluster ACL sprawl. Changes are auditable. Rollouts are predictable. And the principle of least privilege becomes practical instead of aspirational.

You can manage RBAC interactively in the Console or declaratively via API and Terraform as part of your CI/CD flows. Either way, you get a uniform security posture across protocols and deployments, so consolidating onto one platform doesn’t mean compromising on governance.

Enable it today: open the Accounts & Accesses → Access section in the Console, assign your first roles, and replace ad-hoc ACLs with a model that scales with your organization.

You can read the detailed announcement in this blog post.

Orca Private Preview: a backbone for real-time, event-driven, autonomous agents

Enterprises have experimented with agent frameworks in notebooks and prototypes. The sticking point has been production: agents need live context, persistent memory, safe tool use, coordination, and observability—all the operational traits we already demand from distributed systems. Orca is our answer: an event-driven Agent Engine that runs on the same streaming fabric as your data so agents become long-lived services rather than one-off invocations.

With Orca, agents subscribe to topics, react to events as they happen, and emit new events for downstream agents and applications. They maintain a persistent state so knowledge accumulates across sessions. They discover and call tools or peer agents via a shared registry. And because every observation and action flows through durable event logs, you gain traceability and auditability: you can replay inputs, inspect memory, and understand why an agent did what it did. It’s the operational backbone autonomous systems have been missing.

This is infrastructure, not another framework. Bring the agents you already have—Python today, including those built with OpenAI or Google ADK—and let Orca host them as event-driven functions. Under the hood, the engine leans on battle-tested StreamNative primitives for scale, concurrency, and failure recovery, so you can focus on behavior and policy rather than plumbing and retries.

Orca is available in Private Preview on StreamNative Cloud starting with BYOC, with additional deployment modes to follow. Early adopters are already using it to watch operational streams and take first-line actions, to orchestrate multi-agent workflows that triage and escalate based on data, and to close the loop between analytics signals and operational responses.

Try Orca: ask your account team to enable the Agent Engine preview, deploy a Python agent against a live topic, and watch it come alive in the stream. Our quickstarts get you from code to continuously-running agent in minutes.

One platform, one path from data → insights → actions

The industry doesn’t need many disjointed systems to move from raw events to business outcomes. It needs a single platform that lowers the cost of data streaming, makes insights immediately available under governance, and turns those insights into real-time actions. That’s the through-line of this launch. With Ursa storage available across Classic clusters, you get lakehouse economics now and a clean, no-migration path to Ursa brokers when you’re ready. With our expanded Unity Catalog integration for managed Iceberg tables, streams land as governed, query-ready assets the moment they arrive. With RBAC now GA, access control shifts from ad-hoc scripts to a single, auditable model that fits how enterprises actually operate. And with Orca in Private Preview, autonomous agents can finally live on the same backbone as your data—perceiving, deciding, and acting in real time.

Start where your bottleneck is sharpest. If you’re pushing for lakehouse integration and cost relief, enable the Ursa lakehouse tier on your Classic clusters and watch topic data from Pulsar or Kafka persist directly to Iceberg. If analytics friction is the blocker, connect your StreamNative cluster to Unity Catalog, publish your first topic, and run governed SQL over the table it creates—no second pipeline required. If security is the imperative, define roles once in RBAC and apply them uniformly across tenants, namespaces, topics, and schemas. And if your next step is AI, deploy an agent on Orca and let it live in the stream, with memory, observability, and guardrails from day one.

The path from data to insights to actions shouldn’t require a leap across tool silos. With StreamNative Cloud, it’s a continuous flow—practical, open, and intelligent—and it’s available today.

This is some text inside of a div block.

Button Text

Kundan Vyas

Kundan is a Staff Product Manager at StreamNative, where he spearheads StreamNative Cloud, Lakehouse Storage and compute platform for connectivity, functions, and stream processing. Kundan also leads Partner Strategy at StreamNative, focusing on building strong, mutually beneficial relationships that enhance the company's offerings and reach.

Our strategies and tactics delivered right to your inbox

Thank you! Your submission has been received!

Oops! Something went wrong while submitting the form.

Q3 2025 Data Streaming Launch: Lakehouse Streaming, Governed Analytics, and Event-Driven Agents

Ursa Everywhere: the lakehouse-native path to data streaming—now for every Classic cluster

Streaming that arrives governed: Unity Catalog with managed Iceberg tables

RBAC GA: least-privilege access for multi-tenant streaming

Orca Private Preview: a backbone for real-time, event-driven, autonomous agents

One platform, one path from data → insights → actions

Newsletter