Keynote - Data Streaming Summit San Francisco 2025

Session Overview

Join us to explore how Ursa, Orca, and the streaming lakehouse enable cost-efficient, real-time architectures that connect data, analytics, and AI agents across modern cloud environments.

From Data to Intelligence — Cost, Lakehouse, and AI Agents | Data Streaming Summit 2025 Keynote

How do we bridge streaming, analytics, and AI into a single, intelligent system that scales without runaway cost? At Data Streaming Summit 2025, StreamNative presented a bold new blueprint for the Agentic Era — a time when real-time data, lakehouse architectures, and AI agents converge into one unified, incremental stack.

This keynote session brings together leaders from StreamNative, Databricks, LinkedIn, and OpenAI to explore the technologies, architectures, and success stories shaping the future of intelligent data infrastructure.

🔷 A Unified Blueprint for the Agentic Era

The keynote opens with a connected story that doubles as a roadmap for the next decade of data infrastructure: Stream the data. Accelerate the insights. Empower the agents. From ingestion to decision-making, this architecture bridges motion (streaming), rest (lakehouse), and action (AI agents) — redefining how real-time intelligence is built and delivered.

⚙️ Ursa — The Lakehouse-Native Streaming Engine

StreamNative CTO Matteo Merli unveils Ursa, a new lakehouse-native engine built to bend the cloud cost curve. Ursa unifies low-latency streaming with cost-optimized object storage, eliminating connector overhead, cross-AZ replication costs, and complex synchronization layers. Now production-ready as a Pulsar storage extension, Ursa enables enterprise workloads to scale efficiently — achieving high throughput, low latency, and significant cost savings.

🧩 Unity Catalog + Iceberg — Streams as First-Class Tables

Kundan (StreamNative) and Michelle (Databricks) reveal how the Unity Catalog and Apache Iceberg integration transforms streaming topics into instantly queryable, governed tables. This native integration allows organizations to unify governance, enforce fine-grained access control, and eliminate data duplication — effectively bridging the gap between streaming and the lakehouse.

🤖 Orca — Bringing AI Agents into the Event Fabric

Neng Lu, Director of Platform Engineering at StreamNative, introduces Orca, a streaming-native runtime for AI agents. Python-first and framework-agnostic, Orca allows agents to be deployed and coordinated directly inside event streams, bringing state, governance, replay, and dynamic tool discovery to real-time AI systems. A live demo showcases how agents scale, recover, and collaborate autonomously — turning event streams into an intelligent, self-evolving fabric.

🏗️ Architectures That Validate the Vision — LinkedIn & OpenAI

Two global leaders shared how they are redefining large-scale streaming architectures:

LinkedIn unveiled NorthGuard, a log store handling 32 trillion records per day, with elasticity, self-balancing clusters, and segment-based replication.
OpenAI demonstrated how real-time streaming powers model training, experimentation, and conversational AI, achieving 80 GB/s throughput and 3× quarterly growth using Kafka, Flink, and custom infrastructure.

🚗 Proof in the Wild — Motorq’s Transformation Story

Motorq, a connected-vehicle intelligence platform, showcased tangible results after adopting StreamNative and Ursa:

50% lower streaming costs
Faster lakehouse ingestion
Near real-time analytics with simplified pipelines and reduced sync overheadThis proves how the new architecture delivers measurable efficiency and performance at scale.

🔥 Fireside Chat — Where the Lakehouse Goes Next

The keynote closes with an inspiring conversation between Sijie (StreamNative CEO) and Reynold Xin (Databricks Co-founder & Chief Architect). They discuss how governance, open formats, and single-file commit semantics will shape the future of the lakehouse — and why streaming-native architectures are essential for reliable, large-scale AI and agentic systems.

🎯 Takeaways:

Understand how real-time streaming, lakehouse, and AI agents converge
Learn cost-optimization strategies for large-scale cloud systems
Explore how leading organizations build next-generation data infrastructure
Gain insight into open standards and design principles for the Agentic Era

Whether you’re a data architect, AI engineer, or system designer, this keynote provides a comprehensive, forward-looking view of how data becomes intelligence — from motion to action, and from cost to insight.

About Speaker

Sijie Guo Sijie’s journey with Apache Pulsar began at Yahoo! where he was part of the team working to develop a global messaging platform for the company. He then went to Twitter, where he led the messaging infrastructure group and co-created DistributedLog and Twitter EventBus. In 2017, he co-founded Streamlio, which was acquired by Splunk, and in 2019 he founded StreamNative. He is one of the original creators of Apache Pulsar and Apache BookKeeper, and remains VP of Apache BookKeeper and PMC Member of Apache Pulsar. Sijie lives in the San Francisco Bay Area of California.

Matteo Meril Matteo is the CTO at StreamNative, where he brings rich experience in distributed pub-sub messaging platforms. Matteo was one of the co-creators of Apache Pulsar during his time at Yahoo!. Matteo worked to create a global, distributed messaging system for Yahoo!, which would later become Apache Pulsar. Matteo is the PMC Chair of Apache Pulsar, where he helps to guide the community and ensure the success of the Pulsar project. He is also a PMC member for Apache BookKeeper. Matteo lives in Menlo Park, California.

Kundan Vyas Director, Product & Partnerships at StreamNative, owning the end-to-end cloud product portfolio across Serverless, Dedicated, and BYOC offerings for Kafka, Pulsar, Flink, and Agentic AI. Leads strategy and execution for lakehouse-native integrations with partners across Iceberg and Delta ecosystems, delivering AI-ready, real-time data platforms. Also owns global partnerships across cloud service providers, ISVs, and system integrators—driving co-build, co-sell, and go-to-market initiatives that accelerate customer adoption, expansion, and new logo growth.

Neng Lu Neng Lu is currently the Director of Platform at StreamNative, where he leads the engineering team in developing the StreamNative ONE Platform and the next-generation Ursa engine. As an Apache Pulsar Committer, he specializes in advancing Pulsar Functions and Pulsar IO Connectors, contributing to the evolution of real-time data streaming technologies. Prior to joining StreamNative, Neng was a Senior Software Engineer at Twitter, where he focused on the Heron project, a cutting-edge real-time computing framework. He holds a Master's degree in Computer Science from the University of California, Los Angeles (UCLA) and a Bachelor's degree from Zhejiang University.

Aravind Suresh Aravind Suresh leads the real-time infrastructure team at OpenAI, where he builds large-scale streaming, real-time, and ML infrastructure that powers AI products like ChatGPT and Sora. Previously, he led infrastructure efforts at Uber to enable exabyte scale data analytics and AI initiatives across Rides, Eats, and Groceries. With over seven years of experience, Aravind specializes in designing and operating mission-critical, high-throughput data platforms for real-time analytics and machine learning systems.

Ashwin Raja Ashwin Raja is the Co-Founder & CTO of Motorq, a leading SaaS company transforming connected car data into actionable insights. With over two decades in technology, he has built world-class engineering teams and scalable platforms at Microsoft, HBO, and multiple startups. At Motorq, Ashwin drives AI-powered telemetry solutions that help global customers unlock efficiencies and create new business models. A champion of innovation, ownership mindset, and mentorship, he has grown Motorq’s development centers into industry models. Ashwin is passionate about nurturing the next generation of tech leaders while delivering impactful solutions for the automotive and mobility ecosystem.

Onur Karaman Onur is a Sr Staff Engineer at LinkedIn with an interest in distributed systems. He's the tech lead of Northguard, a log storage system with a focus on scalability and operability. Prior to Northguard, Onur was a committer to Apache Kafka, where he focused on Kafka's scalability. He redesigned the cluster's controller, made the controller use ZooKeeper's async APIs, and worked on the group coordinator and consumer group management protocol.

Michelle Leon Michelle is a Product Manager at Databricks, focusing on Unity Catalog and Lakehouse storage. She is based in San Francisco.

Reynold Xin Reynold Xin is a cofounder and Chief Architect at Databricks, where he leads the development of core data systems including Apache Spark, Delta Lake, Photon, and Databricks SQL. He holds a PhD in Computer Science from the University of California, Berkeley, where he specialized in large scale data systems.

Keynote - Data Streaming Summit San Francisco 2025

Session Overview

From Data to Intelligence — Cost, Lakehouse, and AI Agents | Data Streaming Summit 2025 Keynote

🔷 A Unified Blueprint for the Agentic Era

⚙️ Ursa — The Lakehouse-Native Streaming Engine

🧩 Unity Catalog + Iceberg — Streams as First-Class Tables

🤖 Orca — Bringing AI Agents into the Event Fabric

🏗️ Architectures That Validate the Vision — LinkedIn & OpenAI

🚗 Proof in the Wild — Motorq’s Transformation Story

🔥 Fireside Chat — Where the Lakehouse Goes Next

Related Resources

Introducing StreamNative Cloud finer-grained Alerting

What Is a Lakestream?

Powering Governed Real-Time Data with StreamNative Kafka Service and Snowflake Horizon Catalog

Make Your Data Ready and Safe for Agentic AI