High-throughput streaming Lakehouse with Apache Hudi

Session Overview

Discover how Apache Hudi's NBCC enables high-throughput, conflict-free streaming ingestion in Flink-powered lakehouse pipelines for real-time analytics.

As real-time data processing demands grow, handling high-throughput streaming ingestion without conflicts is critical. Traditional concurrency control mechanisms often fail under heavy concurrent writes, leading to errors, wasted resources, and stale data.

In this talk, we explore Apache Hudi’s Non-Blocking Concurrency Control (NBCC) — a game-changing approach that enables conflict-free, high-throughput ingestion into a single table. Leveraging innovative file layouts, TrueTime semantics, and a bucket index, NBCC ensures efficient event-time ordering and seamless integration with Apache Flink pipelines. This empowers teams to run simultaneous updates, real-time dataset joins, and multiple writers without bottlenecks.

Key highlights include:

NBCC architecture & workflow: Learn how it aligns with Flink for streaming ingestion
Conflict-free ingestion: How file layouts, LSM-tree backed logs, and TrueTime semantics remove bottlenecks
Practical demo: Flink SQL demonstration of concurrent writes across multiple streams
Future of NBCC: Upcoming enhancements for metadata tables, clustering, and advanced indexing

Whether you’re building high-throughput streaming pipelines or modern lakehouse solutions, this session provides actionable insights for efficient, scalable, and real-time data ingestion.

About Speaker

Shiyan Xu Shiyan Xu works as a data architect for open source projects at Onehouse. While serving as a PMC member of Apache Hudi, he currently leads the development of Hudi-rs, the native Rust implementation of Hudi, and the writing of the book "Apache Hudi: The Definitive Guide" by O'Reilly. He also provides consultations to community users and helps run Hudi pipelines at a production scale.

High-throughput streaming Lakehouse with Apache Hudi

Session Overview

Related Resources

Announcing StreamNative Kafka Service Launch Partners

From Streams to Lakestreams: The Next Paradigm in Data Infrastructure

StreamNative and StarTree Partner to Deliver Real-Time Analytics on Native Kafka

Make Your Data Ready and Safe for Agentic AI