
TL;DR
Netflix addresses the challenge of scaling real-time data processing by developing a Data Mesh platform, leveraging Apache Flink and Kafka to handle trillions of events daily. The solution democratizes data stream processing with Streaming SQL, allowing users to perform complex transformations easily. This enhances Netflix's ability to deliver personalized user experiences and manage operations at a global scale.
Opening
At Netflix, where entertainment meets technology, handling data at an unprecedented scale is crucial. With over 300 million members generating data across more than a billion devices, Netflix processes trillions of events daily. This monumental task is powered by a sophisticated Data Mesh platform, which handles everything from personalized recommendations to game analytics, ensuring a seamless user experience across the globe.
What You'll Learn (Key Takeaways)
- Data Mesh at Scale – Discover how Netflix's Data Mesh operates with over 20,000 Apache Flink jobs and thousands of Kafka topics, enabling real-time data processing for personalized content delivery.
- Democratizing Stream Processing – Learn about the innovative use of Streaming SQL, which simplifies complex data transformations, making real-time processing accessible to a broader range of users.
- Integration with Next-Gen Connectors – Explore how Netflix integrates with systems like Apache Iceberg and Kafka to enhance data analytics and observability.
- Strategic Leadership in Data Platforms – Gain insights into the strategic leadership techniques that drive the development roadmap, planning, and prioritization within Netflix's technical teams.
Q&A Highlights
Q: Are you using Kafka schema in the CDC pipeline, or purely JSON?
A: We use Avro for event serialization with a fork of Confluence schema registry for schema management.
Q: What's the latency requirement for log events?
A: Latency requirements vary by use case, such as ads or live events needing lower latency. Complexity of the pipeline and data transformations also affect latency.
Q: Are you using Flink in batch mode, or purely streaming SQL?
A: Currently, we use Flink in streaming mode, though we are exploring ways to abstract both streaming and batch transformations for ease of use.
Q: Is Data Mesh open source?
A: No, Data Mesh is not open source due to its tight integration with Netflix's internal ecosystem.
Q: How does performance look when performing lookup join with Iceberg tables?
A: Lookup joins are used for small, in-memory dimensional data, such as metadata or geolocation data, ensuring performance efficiency.
Newsletter
Our strategies and tactics delivered right to your inbox