Flink Streaming Ingestion to Cloud-lake at Scale

Session Overview

Learn how Uber uses Apache Flink and Hudi to power real-time data ingestion into its Cloud Lake for scalable, efficient, and secure machine learning.

At Uber, real-time data powers everything—from pricing and matching to logistics and safety. In this session, Uber engineers share how Apache Flink and Apache Hudi are used to build streaming ingestion pipelines for Uber’s Cloud Lake, enabling real-time machine learning and analytics in a hybrid cloud environment.

You’ll get a detailed look into:

Architecture: How Uber manages thousands of Flink ingestion pipelines with built-in deployment safety, failover mechanisms, and disaster recovery.
Runtime: How Uber ensures cross-cloud data security, column-level access control, and data privacy at scale.
Operational Efficiency: How Flink Autoscaler as a Service dynamically adjusts workload parallelism to reduce cost, and how partial sort optimization drives further efficiency.
Apache Hudi: Deep dive into Hudi’s streaming integration for Flink and its non-blocking concurrency control (NBCC) design.

If you’re running large-scale streaming systems or exploring real-time ingestion for ML and analytics, this session delivers practical insights and architectural patterns proven in Uber’s production environment.

About Speaker

Zhenqiu Huang Zhenqiu Huang has been in the Apache Flink Community for a long time. He built a Streaming Platform at Uber Technology and Apple Inc. He recently worked with the Apache Hudi community on building Streaming Ingestion to Cloud Lake at Uber.

Shiyan Xu Shiyan Xu works as a data architect for open source projects at Onehouse. While serving as a PMC member of Apache Hudi, he currently leads the development of Hudi-rs, the native Rust implementation of Hudi, and the writing of the book "Apache Hudi: The Definitive Guide" by O'Reilly. He also provides consultations to community users and helps run Hudi pipelines at a production scale.

Flink Streaming Ingestion to Cloud-lake at Scale

Session Overview

Related Resources

Announcing StreamNative Kafka Service Launch Partners

From Streams to Lakestreams: The Next Paradigm in Data Infrastructure

StreamNative and StarTree Partner to Deliver Real-Time Analytics on Native Kafka

Make Your Data Ready and Safe for Agentic AI