Safe Streams at Scale: Uber’s Deployment Safety Framework for Flink Jobs
Yusheng Chen
Yao Li

Operating thousands of Apache Flink jobs that power real-time decisioning, pricing, and operational workflows is no small feat — especially when every deployment has the potential to impact millions of users. At Uber, maintaining reliability, scalability, and speed across our Flink FaaS (Flink-as-a-Service) platform demanded a new approach to deployment safety at scale.

In this talk, we introduce Uber’s Deployment Safety Framework for Flink Jobs, a system that delivers safe, fast, and automated deployments through full-lifecycle quality control. Learn how we built an ecosystem that combines progressive rollouts, automated testing, and intelligent rollback mechanisms to ensure stability without slowing down innovation.

Key topics include:

  • Deployment Incrementality – Progressive rollouts that limit blast radius and ensure safety.
  • Automation & CI/CD Guardrails – Consistent code and config validation across environments.
  • Unit & End-to-End Testing – Catching risky changes early through automated checks and traffic injection.
  • Smart Rollbacks – Automated, metric-triggered rollbacks that prevent widespread failures.
  • Tenant-Aware Testing – Validation of behavior across real workloads and environments.

We’ll share lessons learned from scaling this system to thousands of streaming jobs and how it strengthened Uber’s real-time, event-driven data platform.

Whether you manage Flink at scale or operate mission-critical streaming systems, this session offers practical insights into building safe, self-healing deployment pipelines for modern data platforms.

Yusheng Chen
Staff Engineer, Uber

Yusheng Chen is Staff Engineer of streaming data analytics platform. His team provides services to develop reliable, scalable, and high-performing stream processing applications. He is the tech lead to bring safe deployment to Flink as a service platform in Uber.

Yao Li
Sr. Software Engineer, Yao Li

Yao Li is a Sr. Software Engineer on Uber's Flink team and an Apache Heron (Incubating) committer. With a PhD and postdoctoral research background in Electronic Engineering, Yao brings deep expertise in real-time streaming systems and large-scale data infrastructure.

Newsletter

Our strategies and tactics delivered right to your inbox

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.