
Real-time data ingestion at scale is a cornerstone of modern AI and analytics. In this talk, explore how OpenAI built StreamLink, a high-performance streaming ingestion platform powered by Apache Flink, designed to meet the data needs of both humans and AI systems.
StreamLink ingests over 30+ GiB/s of data from Kafka into Delta Lake and Apache Iceberg tables, supporting 1000+ datasets across 20+ teams—all while maintaining reliability and operational efficiency.
Join this deep dive to learn:
- How StreamLink enables large-scale, real-time ingestion in OpenAI’s lakehouse
- The architecture behind its Kubernetes-native deployment using the Flink K8s Operator
- How adaptive autoscaling and self-service onboarding keep operations fast and lean
- Best practices and design patterns for building scalable streaming systems at enterprise scale
Whether you’re operating a data platform or scaling streaming infrastructure, this session offers practical insights for powering the next generation of real-time, AI-driven analytics.
Recommended resources
Watch more events.
Newsletter
Our strategies and tactics delivered right to your inbox

.png)

.png)

