keynote
25 min
Building the Next Generation of Real Time Data Pipelines: Data Mesh and Streaming SQL at Netflix
Sujay Jain
Guil Pires

TL;DR

Netflix addresses the challenge of scaling real-time data processing by developing a Data Mesh platform, leveraging Apache Flink and Kafka to handle trillions of events daily. The solution democratizes data stream processing with Streaming SQL, allowing users to perform complex transformations easily. This enhances Netflix's ability to deliver personalized user experiences and manage operations at a global scale.

Opening

At Netflix, where entertainment meets technology, handling data at an unprecedented scale is crucial. With over 300 million members generating data across more than a billion devices, Netflix processes trillions of events daily. This monumental task is powered by a sophisticated Data Mesh platform, which handles everything from personalized recommendations to game analytics, ensuring a seamless user experience across the globe.

What You'll Learn (Key Takeaways)

  • Data Mesh at Scale – Discover how Netflix's Data Mesh operates with over 20,000 Apache Flink jobs and thousands of Kafka topics, enabling real-time data processing for personalized content delivery.
  • Democratizing Stream Processing – Learn about the innovative use of Streaming SQL, which simplifies complex data transformations, making real-time processing accessible to a broader range of users.
  • Integration with Next-Gen Connectors – Explore how Netflix integrates with systems like Apache Iceberg and Kafka to enhance data analytics and observability.
  • Strategic Leadership in Data Platforms – Gain insights into the strategic leadership techniques that drive the development roadmap, planning, and prioritization within Netflix's technical teams.

Q&A Highlights

Q: Are you using Kafka schema in the CDC pipeline, or purely JSON?
A: We use Avro for event serialization with a fork of Confluence schema registry for schema management.

Q: What's the latency requirement for log events?
A: Latency requirements vary by use case, such as ads or live events needing lower latency. Complexity of the pipeline and data transformations also affect latency.

Q: Are you using Flink in batch mode, or purely streaming SQL?
A: Currently, we use Flink in streaming mode, though we are exploring ways to abstract both streaming and batch transformations for ease of use.

Q: Is Data Mesh open source?
A: No, Data Mesh is not open source due to its tight integration with Netflix's internal ecosystem.

Q: How does performance look when performing lookup join with Iceberg tables?
A: Lookup joins are used for small, in-memory dimensional data, such as metadata or geolocation data, ensuring performance efficiency.

Sujay Jain
Senior Software Engineer, Netflix

Sujay is a Senior Software Engineer at Netflix, where he is part of the Data Platform team. He tackles a range of challenges in data movement, processing and real-time analytics space and has contributed to key platforms like Data Mesh and Keystone. Prior to joining Netflix, Sujay was part of the Data Infrastructure team at Meta.

Guil Pires
Senior Software Engineer, Data Platform, Netflix

Guil is a Senior Software Engineer in the Data Platform at Netflix, focussing on Data Movement and Stream Processing. His goal since day 1 has been to democratize stream processing by empowering data and software engineers to leverage the power of abstractions. He led the development of Streaming SQL, Pipelines as Code and Druid connectors along with many other innovations in the Stream Processing at Netflix.

Newsletter

Our strategies and tactics delivered right to your inbox

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.