High-throughput streaming in Lakehouse with Non-Blocking Concurrency Control in Apache Flink & Hudi

Discover how Apache Hudi's Non-Blocking Concurrency Control boosts real-time data ingestion efficiency. Learn to implement conflict-free streaming pipelines!

TL;DR

The session addresses the challenge of achieving high-throughput and conflict-free streaming ingestion in real-time data processing. The solution presented is Apache Hudi's Non-Blocking Concurrency Control (NBCC), which allows multiple streams to write concurrently to the same table without conflicts. This innovation enables efficient real-time data processing, enhancing data freshness and reducing resource wastage.

Opening

In the fast-paced world of data streaming, Uber's early data processing challenges highlight a common industry pain point: the struggle to maintain data freshness and processing efficiency amidst high-volume, concurrent data ingestion. Their journey from a cumbersome 24-hour data refresh cycle to real-time updates laid the groundwork for Apache Hudi's evolution. This transformation was driven by the need for a system capable of handling updates, deletes, and upserts while ensuring atomicity and consistency, paving the way for the development of Hudi's Non-Blocking Concurrency Control.

What You'll Learn (Key Takeaways)

Apache Hudi's Non-Blocking Concurrency Control (NBCC) – Learn how NBCC manages concurrent writes without conflicts, enhancing data throughput and freshness in streaming architectures.
Integration with Apache Flink – Discover the synergy between Flink and Hudi, leveraging NBCC for efficient, real-time data processing pipelines.
Innovative File Layout and Indexing – Understand how Hudi's file layout and bucket index support event-time ordering and conflict-free ingestion.
Future of NBCC – Explore upcoming enhancements like extensions to metadata tables, clustering, and various index types, promising even greater efficiency.

Q&A Highlights

Q: How does Hudi differ from Iceberg in handling write-heavy workloads? A: Hudi excels in low-latency, write-heavy scenarios due to its rich indexing and native table management services, making it ideal for high-frequency streaming workloads.

Q: What optimizations does Hudi provide for low latency compared to other data lakehouses? A: Hudi's unique design includes built-in compaction, clustering, and indexing, allowing for efficient upserts and fast data processing, ideal for real-time analytics.

Q: Is table maintenance built into Hudi? A: Yes, Hudi incorporates native table management services, allowing for seamless compaction, clustering, and maintenance without relying on external scheduling or compute engines.

High-throughput streaming in Lakehouse with Non-Blocking Concurrency Control in Apache Flink & Hudi

Session Overview

TL;DR

Opening

What You'll Learn (Key Takeaways)

Q&A Highlights

Related Resources

Announcing Data Streaming Summit 2026: The Data Streaming + Agent Infra Conference

Introducing Scalable Topics in Apache Pulsar 5.0

Introducing StreamNative Agent Skills: From Connectivity to Expertise for the Agentic Era

Make Your Data Ready and Safe for Agentic AI