
How does a Kafka cluster handle 10× or even 100× traffic spikes while maintaining high throughput and availability? At Netflix, live streaming events place unprecedented demands on our core Kafka infrastructure, requiring innovative solutions to keep services resilient under extreme load.
In this talk, we share Netflix’s blueprint for Kafka resilience, covering strategies that go beyond out-of-the-box configurations to maximize uptime, minimize data loss, and maintain service performance during peak loads.
Key topics include:
- Broker Stability Under Overload: Techniques to ensure Kafka brokers remain stable even during extreme traffic surges.
- Adaptive Clients: Transforming producers and consumers into active participants that dynamically adjust behavior in real time to protect cluster health.
- Operational Insights: Lessons learned from scaling Kafka at Netflix, including monitoring, failure mitigation, and proactive management strategies.
- High-Throughput Design Patterns: Architectures and operational patterns to sustain performance during unpredictable traffic spikes.
Whether you’re a Kafka engineer, platform architect, or operations lead, this talk provides actionable strategies and insights for building resilient, scalable, and high-performing Kafka infrastructures capable of surviving even the most demanding workloads.
Recommended resources
Watch more events.
Newsletter
Our strategies and tactics delivered right to your inbox

.png)

.png)


