StreamLink: Real-Time Data Ingestion at OpenAI Scale
Adam Richardson

Real-time data ingestion at scale is a cornerstone of modern AI and analytics. In this talk, explore how OpenAI built StreamLink, a high-performance streaming ingestion platform powered by Apache Flink, designed to meet the data needs of both humans and AI systems.

StreamLink ingests over 30+ GiB/s of data from Kafka into Delta Lake and Apache Iceberg tables, supporting 1000+ datasets across 20+ teams—all while maintaining reliability and operational efficiency.

Join this deep dive to learn:

  • How StreamLink enables large-scale, real-time ingestion in OpenAI’s lakehouse
  • The architecture behind its Kubernetes-native deployment using the Flink K8s Operator
  • How adaptive autoscaling and self-service onboarding keep operations fast and lean
  • Best practices and design patterns for building scalable streaming systems at enterprise scale

Whether you’re operating a data platform or scaling streaming infrastructure, this session offers practical insights for powering the next generation of real-time, AI-driven analytics.

Adam Richardson
Member of Technical Staff, OpenAI

Adam Richardson is the tech lead for the Realtime Infrastructure team at OpenAI. OpenAI's Realtime Infrastructure team supports hundreds of Kafka and Flink use cases across the entire organization. Previously, Adam was the tech lead for the Data Movement team at Stripe, building and managing ELT pipelines at petabyte scale.

Newsletter

Our strategies and tactics delivered right to your inbox

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.