Explore the Future of Data Streaming & AI - Data Streaming Summit · Sept 29-30 · Grand Hyatt at SFO.

By clicking "Accept all cookies" you agree to have cookies stored on your device to improve site navigation, analyze site usage, and assist with our marketing efforts. See our privacy policy for more information.

Blog

June 6, 2025

8 min read

Data Streaming Summit Virtual 2025 Recap

Emma Tian

Text Link

Community

Data Streaming Summit

‍Agentic AI: The New Paradigm for Intelligent Systems

“We are entering this new agentic evolution.”
— Sijie Guo, Co-founder & CEO, StreamNative

As the digital world shifts away from static models towards systems of continuous adaptation, the Data Streaming Summit Virtual 2025 spotlighted a decisive transformation: the rise of Agentic AI. Over two days, with more than 36 sessions from top minds in data streaming, the summit mapped out how real-time technologies, open-source ecosystems, and unified architectures are driving the next wave of intelligent systems.

In this recap, we highlight the summit’s three central themes: Agentic AI, the open-source revolution in data infrastructure, and the convergence of stream and batch into the Streaming Lakehouse. We’ll also spotlight user stories and technical innovations shaping the road ahead.

Keynote Highlights

The summit opened with a bold vision. Sijie Guo, CEO of StreamNative, framed the emergence of Agentic AI as “a new evolution,” forecasting a future where autonomous agents powered by real-time streams are not a luxury—but the norm. “Every enterprise will run real-time intelligent agents as a standard part of their operations,” he declared.

Matteo Merli, StreamNative CTO, shared the latest updates on the Ursa Engine: a Kafka-compatible, cloud-native streaming engine built on the success of Apache Pulsar. “Ursa Engine brings 95% cost savings for real-time workloads”, Merli shared, citing its separation of storage and compute, Lakehouse-native storage, and compatibility with Kafka protocol.

The release of Apache Flink 2.0 was another milestone, introduced by Xintong Song of Alibaba Cloud. “Flink 2.0 unlocks more AI use cases with lower costs”, he noted, referencing its disaggregated state management and AI-focused APIs that make streaming more accessible and scalable than ever.

Q6 Cyber offered a practitioner’s perspective on stream-first architecture. In their session, the team shared how they replaced a complex patchwork of cloud services and homegrown queues with Apache Pulsar at the center of their stack, streaming over 75 billion records into a Hudi lakehouse. They overcame serialization bottlenecks, multithreaded performance challenges, and scaled Pulsar Functions to meet the demands of large-scale security telemetry. This real-world story reinforced the summit’s themes of architectural simplification, open-source reliability, and scalability in mission-critical environments

These announcements reflected an industry-wide push for open, composable data architectures—laying the groundwork for intelligent, event-driven agentic systems at enterprise scale.

Agentic AI: From Model-Centric to Agent-Centric Systems

Agentic AI marks a pivot from passive models to autonomous, goal-driven agents that interact with and respond to real-time signals. This evolution was a central theme across the summit and set the tone for how AI systems will be built and operated in the future.

In a major keynote announcement, Neng Lu, Director of Platform Engineering at StreamNative, introduced the StreamNative Agent Engine, a new runtime that enables autonomous AI agents to process real-time events, reason with context, and take intelligent actions. “The Agent Engine is designed for systems that think and act”, Lu explained. It supports both Kafka and Pulsar protocols, integrates with frameworks like LangChain, LlamaIndex, and Google ADK, and provides a unified registry to manage both deterministic streaming functions and statistical agents (workflows). The Agent Engine combines deterministic event processing with context-aware decision-making—blending the reliability of stream processing with the flexibility of LLMs.

The Agent Engine is built to power the next generation of applications—from real-time copilots to self-healing infrastructure. According to the launch blog, it consists of modular components: a function runtime for event handling, a context store for shared memory, and a decision loop that ties together agents and environments. It positions StreamNative as a pioneer in agent-native infrastructure, bringing AI and stream processing into a unified, programmable environment.

Other sessions expanded on this vision. Mary Grygleski, in her session on generative AI workflows, explained how event-driven systems facilitate asynchronous communication between agents: “Event-driven architectures allow agents to work independently and scale flexibly,” she noted—critical attributes for distributed, multi-agent systems.

Andrew Brooks (Contextual Software) linked real-time streaming directly to business value: “Speed to processing means speed to payment,” he emphasized, illustrating how streaming pipelines improve responsiveness and ROI.

Finally, Hubert Zhang (EloqData) provided an architectural view of how Apache Pulsar supports elastic AI-native pipelines. “Pulsar gives you great streaming. EloqDoc gives you a better document store. ConvertDB ties it all together,” he explained, showing how decoupling compute and storage improves both scalability and cost-efficiency.

Together, these sessions and product launches reflect the rise of agentic architectures as not just a technical shift, but a strategic imperative for building intelligent, autonomous systems. As real-time context becomes the fuel for AI agents, infrastructures like the Agent Engine will be foundational for the next wave of enterprise AI.

Open Source Ecosystem: Innovation through Community

The open-source ethos was front and center at the Data Streaming Summit Virtual 2025, showcasing how diverse communities are collaboratively shaping the future of real-time data streaming. As we are entering this new agentic evolution, open source is the foundation that will carry us forward.

This year’s summit featured technical leaders from across the open-source ecosystem: Kafka, Pulsar, and Ursa in the streaming layer; Flink, Spark, and RisingWave in processing; and Iceberg and Hudi in the lakehouse tier. This diversity reflects a maturing community—one that recognizes no single project can address every use case, but together, they form a cohesive and interoperable stack.

Sessions exemplified this collaborative spirit. David Kjerrumgaard from StreamNative and Peter Corless from StarTree introduced StreamQoS, an open standard for defining performance and SLA expectations across Kafka, RabbitMQ, and Pulsar, inviting the community to participate: “Get involved. We want your feedback. That’s the whole point of this.”

Penghui Li demonstrated how Ursa, a Kafka-compatible platform built on open table formats like Iceberg and Delta Lake, dramatically cuts inter-zone traffic via an S3-native architecture. “We saved almost all the internet traffic,” he said, underscoring the cost efficiencies possible through shared infrastructure and community innovation.

Other talks emphasized cross-project interoperability: from validating streaming correctness at scale with tooling built for Kafka, Pulsar, and Ursa, to exploring metadata unification with Oxia, a Zookeeper alternative designed to serve multiple ecosystems. A standout session on the “Apache Kafka API: The Unofficial Standard” explored how compatibility across platforms is helping break down silos while preserving developer familiarity.

As Apurva Mehta of Responsive shared in “Why Stream Processors Must Evolve,” the call for modular, Kubernetes-native, and community-driven processors is clear: “The obvious solution is to unbundle the state and control plane layers of Kafka Streams.”

By bringing together contributors and users from varied technologies—rather than promoting a single project—the summit highlighted a rising movement: one of interoperability over lock-in, modularity over monoliths, and open participation over vendor control.

The Data Streaming Summit is not just a gathering of like-minded developers—it's a reflection of a global, collaborative momentum across the open-source data stack. Whether you’re building with Flink or Spark, Kafka or Pulsar, Iceberg or Hudi, the summit reinforced a unifying message: we’re stronger when we build together.

Streaming Lakehouse: Converging Stream and Batch

A key theme of the summit was the convergence of real-time and historical analytics in a unified architecture: the Streaming Lakehouse. This approach breaks down the silos between streaming ingestion and analytical querying, enabling faster, more cost-effective data pipelines.

In "Fluss: Reinventing Kafka for the Real-Time Lakehouse," Jark Wu of Alibaba Cloud introduced a Kafka-compatible, lakehouse-native engine that supports real-time reads, writes, deletes, and key lookups—all using a columnar storage format. “Fluss supports real-time streaming reads and writes, just like Kafka, but also supports updates, deletes, and key lookups,” Wu explained, highlighting both performance and cost efficiency.

Motorq’s Anirudh TN showcased how streaming into Snowflake with StreamNative’s Kafka connectors and Snowpipe Streaming eliminated the need for intermediate storage. “Our data latency dropped to seconds—and our cost dropped 2.5x,” he noted, affirming the economic case for streaming-first architecture.

Ververica’s Abdul Rehman Zafar presented a blueprint for replacing traditional ETL using Apache Flink, Iceberg, and Paimon. "Using Paimon, you can replace Kafka completely,” he said, positioning Paimon as a streaming-native catalog store that merges batch and stream semantics.

Lee Kear from AWS introduced Amazon S3 Tables, a new abstraction over S3 buckets optimized for high-throughput Iceberg ingestion. “With S3 Tables, you get up to 10 times the transactions per second, or TPS, out of the box,” Kear explained. The system supports real-time analytics with smart partitioning and compaction strategies while simplifying security and performance tuning for streaming data lakes.

Dipankar Mazumdar from Onehouse.ai explored the concurrency challenge in streaming pipelines in his session, "High-Throughput Streaming in Lakehouse with Non-Blocking Concurrency Control (NBCC)". He demonstrated how NBCC in Apache Hudi eliminates write conflicts by enabling simultaneous ingestion across multiple writers. “Non-blocking concurrency control delivers zero write conflicts and consistent reads—all while Flink keeps every writer at full speed,” he explained, marking a significant advancement over traditional Optimistic Concurrency Control models.

Together, these sessions highlight how the streaming lakehouse has matured from a theoretical goal to a production-ready design pattern. Whether optimizing for cost, throughput, or developer simplicity, the future of data platforms lies in unifying batch and stream—delivered through an ecosystem of interoperable, cloud-native technologies.

User Spotlights: Real-World Transformation

Real-world use cases at the summit demonstrated how data streaming transforms industries:

Netflix processes over 14 trillion records daily via Kafka and Flink, using a Data Mesh architecture. “We handle up to 100 million events per second,” said Sujay Jain, enabling real-time recommendations and game analytics.
A European bank achieved 4x faster performance and 30% cost savings by tuning Flink SQL for lower state usage and optimized memory. “The checkpoint time dropped 60%,” shared Zafar.
Attentive, a messaging platform, overcame distributed locking challenges using Pulsar’s Key Shared subscription. “We sent 620 million messages on Black Friday—without issues,” said Staff Engineer Danish Rehman.

These stories validate the summit’s themes: scalability, elasticity, and real-time intelligence are not theoretical—they're achievable today.

Technical Innovations Shaping the Future

Several technical breakthroughs stood out:

Fluss eliminates Kafka’s need for compaction by using columnar storage and supporting updates and deletes—ideal for lakehouse-native streaming with real-time and historical query unification.
Snowpipe Streaming + Kafka Connect accelerates data pipelines with near-zero latency and lower cloud spend by removing intermediate storage and simplifying schema evolution.
StreamQoS introduces cross-protocol QoS negotiation for messaging systems like Kafka, Pulsar, and RabbitMQ, allowing SLAs to be enforced dynamically via open metadata standards.
Ursa implements Kafka topic compaction on S3, optimizing for durability and cost. By using minor and major compactions entirely on cloud object storage, consumers can reconstruct state efficiently without broker disks.
PuppyGraph + Ursa enables real-time graph analytics on data lakes, eliminating the need for dedicated graph databases. Streaming data can be queried using Gremlin or openCypher directly over Iceberg tables—ideal for cybersecurity and observability use cases.
Oxia provides a cloud-native alternative to Zookeeper, offering scalable metadata and index storage with a sharded architecture and stateless coordination. It supports real-time workloads while minimizing latency and operational overhead

These innovations signal a future where data infrastructure is modular, intelligent, and optimized for continuous learning.

Looking Ahead: Shaping the Intelligent Data Backbone

The summit made one thing clear: the age of Agentic AI is here, and real-time data is its backbone. Organizations that embrace open-source innovation, unify their data processing with streaming lakehouses, and build for intelligent agents will lead the next decade.

As we stand at the edge of this transformation, the invitation is clear: join the movement. Build systems that are open, intelligent, and always in motion.

Explore more from the Data Streaming Summit: