Insights from streaming 300B telemetry trace spans per day with Flink

Ursa Wins VLDB 2025 Best Industry Paper: The First Lakehouse-Native Streaming Engine for Kafka

By clicking "Accept all cookies" you agree to have cookies stored on your device to improve site navigation, analyze site usage, and assist with our marketing efforts. See our privacy policy for more information.

Deny Accept

Breakout session

38 mins

Insights from streaming 300B telemetry trace spans per day with Flink

Amrit Sarkar

Vineet Khadloya

Resources

Download Slide Deck ↓

How do you make sense of 300 billion distributed tracing spans per day? At Salesforce, the Monitoring Cloud Telemetry Tracer team tackles this challenge head-on — using Apache Flink to process massive real-time telemetry streams and construct accurate, up-to-date service dependency maps across hundreds of microservices.

In this session, we’ll share key architectural decisions, scaling lessons, and operational insights from running telemetry pipelines at extreme scale. You’ll learn how we:

Process and correlate hundreds of billions of spans in real time
Design robust stateful streaming pipelines for telemetry data
Handle out-of-order events and massive fan-out scenarios
Manage partitioning and state management at scale for high-throughput workloads
Maintain performance and reliability in mission-critical observability systems

This talk is a must-watch for engineers and architects building large-scale telemetry, tracing, or observability pipelines — or anyone interested in pushing Flink to its real-time processing limits.

Amrit Sarkar

Lead Member of Technical Staff, Salesforce

Amrit Sarkar, an engineer with eight years of experience, specializes in the Search and Big Data domains.

Vineet Khadloya

SMTS, Salesforce

Engineer on the Tracer and Moncloud API Platforms Team