Flink Streaming Ingestion to Cloud-lake at Scale

Resources
Download Slide Deck ↓At Uber, real-time data powers everything—from pricing and matching to logistics and safety. In this session, Uber engineers share how Apache Flink and Apache Hudi are used to build streaming ingestion pipelines for Uber’s Cloud Lake, enabling real-time machine learning and analytics in a hybrid cloud environment.
You’ll get a detailed look into:
- Architecture: How Uber manages thousands of Flink ingestion pipelines with built-in deployment safety, failover mechanisms, and disaster recovery.
- Runtime: How Uber ensures cross-cloud data security, column-level access control, and data privacy at scale.
- Operational Efficiency: How Flink Autoscaler as a Service dynamically adjusts workload parallelism to reduce cost, and how partial sort optimization drives further efficiency.
- Apache Hudi: Deep dive into Hudi’s streaming integration for Flink and its non-blocking concurrency control (NBCC) design.
If you’re running large-scale streaming systems or exploring real-time ingestion for ML and analytics, this session delivers practical insights and architectural patterns proven in Uber’s production environment.
Recommended resources
Watch more events.
Newsletter
Our strategies and tactics delivered right to your inbox

.png)

.png)


