High-throughput streaming Lakehouse with Apache Hudi
Shiyan Xu

As real-time data processing demands grow, handling high-throughput streaming ingestion without conflicts is critical. Traditional concurrency control mechanisms often fail under heavy concurrent writes, leading to errors, wasted resources, and stale data.

In this talk, we explore Apache Hudi’s Non-Blocking Concurrency Control (NBCC) — a game-changing approach that enables conflict-free, high-throughput ingestion into a single table. Leveraging innovative file layouts, TrueTime semantics, and a bucket index, NBCC ensures efficient event-time ordering and seamless integration with Apache Flink pipelines. This empowers teams to run simultaneous updates, real-time dataset joins, and multiple writers without bottlenecks.

Key highlights include:

  • NBCC architecture & workflow: Learn how it aligns with Flink for streaming ingestion
  • Conflict-free ingestion: How file layouts, LSM-tree backed logs, and TrueTime semantics remove bottlenecks
  • Practical demo: Flink SQL demonstration of concurrent writes across multiple streams
  • Future of NBCC: Upcoming enhancements for metadata tables, clustering, and advanced indexing

Whether you’re building high-throughput streaming pipelines or modern lakehouse solutions, this session provides actionable insights for efficient, scalable, and real-time data ingestion.

Shiyan Xu
Founding team member, Onehouse

Shiyan Xu works as a data architect for open source projects at Onehouse. While serving as a PMC member of Apache Hudi, he currently leads the development of Hudi-rs, the native Rust implementation of Hudi, and the writing of the book "Apache Hudi: The Definitive Guide" by O'Reilly. He also provides consultations to community users and helps run Hudi pipelines at a production scale.

Newsletter

Our strategies and tactics delivered right to your inbox

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.