Executive Summary
- Real-time ML at scale: Discord built a streaming machine learning platform on Apache Flink, Apache Pulsar, and Apache Iceberg to power safety and personalization features for 150 million monthly active users.
- Strategic migration to Pulsar: The team migrated from Google Cloud Pub/Sub to managed Pulsar through StreamNative, gaining low-latency watermarks, partition-style delivery, and an actively maintained Flink connector.
- Double-digit safety improvements: The platform delivered double-digit metric improvements in both spam detection and account security — built by a very small engineering team using a Kappa architecture that unifies batch and real-time processing.
Watch the Presentation
Customer Overview
Discord is a real-time communication platform with over 150 million monthly active users and 9 million monthly active servers. Discord's Applied Machine Learning team, led by Staff ML Engineer David Christle, focuses on two critical domains: safety (anti-abuse, spam detection, account security) and personalization (server discovery, notification optimization).
Operating at this scale means dealing with millions of accounts, spiky traffic patterns, and adversarial actors who actively adapt to detection methods. The speed and scalability of the ML systems directly impact Discord's ability to protect its users.
"The important point in this space is that the speed and scalability of what we build really matter."
Challenges
Discord's existing architecture was built for heuristic-based safety rules — not machine learning. As the team tried to layer ML models onto this foundation, they encountered fundamental limitations at every level.
Stateless rules engine: The existing system could only perform stateless transformations on individual events. Computing even basic aggregations like an average over historical events was difficult. The rules engine could filter and transform, but it couldn't remember — making ML feature engineering nearly impossible.
Batch/real-time gap: Discord had a batch data engine built on BigQuery with access to production database dumps and analytics events, but the real-time rules engine couldn't access this data. Batch processing is flexible but fundamentally slow and delayed. Stitching batch-computed features with real-time signals is error-prone and creates a large surface area for bugs.
Silent ML failures: Unlike traditional software where bugs produce error messages and stack traces, ML model bugs manifest as degraded performance — with no clear signal about what went wrong.
"There's no error message or stack trace that's thrown when you make a bug in a machine learning model — it's like your model just kind of performs worse but there's all sorts of reasons why that could be."
Backfilling requirements: ML models require months of historical training data. If you introduce a new feature, you can't wait months to start collecting data — you need to backfill it over historical events. And if a bug corrupts your training data, you may need to start the entire process over.
Infrastructure complexity: Each ML model required deploying a separate microservice. With attackers constantly adapting their tactics, the team needed to iterate rapidly — but the overhead of managing per-model infrastructure slowed them down.
Google Cloud Pub/Sub limitations: When the team began building their streaming platform with Apache Flink, they initially used Google Cloud Pub/Sub as their messaging layer. This created significant problems: the Flink Pub/Sub connector was poorly maintained, there was no unified batch+stream API, watermark lag reached 20–40 seconds (unacceptable for real-time ML), transactions were missing, and costs were high at Discord's scale.
Solution — Why Pulsar
The decision to migrate from Google Cloud Pub/Sub to Apache Pulsar was driven by several unique capabilities that aligned perfectly with Discord's requirements.
Queue and partition delivery modes: Discord has many queue-style workloads with wide fan-out patterns. Pulsar uniquely supports both queue delivery — which is very popular at Discord — and partitioned delivery with time-ordering guarantees. This dual capability was a key differentiator.
"The key for us is that Pulsar not only has the queue delivery mode, which is very popular in Discord, but it also has a partitioned style delivery mode."
Low-latency watermarks: Pulsar's partitions provide time-ordering guarantees that queues inherently lack. The team leverages these ordering guarantees to achieve very short watermark lag, enabling accurate ML results with low latency — a critical requirement for real-time safety decisions.
"These partitions have time ordering guarantees that queues don't, and we take advantage of that to have a very short watermark kind of experience. And so that's how we can get accurate results with low latency."
Disaggregated storage and compute: Pulsar's architecture separates storage from compute, making scaling straightforward — essential for a small team that can't afford to manage complex infrastructure.
Managed Pulsar through StreamNative: With a very small engineering team, Discord chose not to manage Pulsar infrastructure themselves.
"We use a managed Pulsar cluster through StreamNative."
Active Flink connector: The Pulsar Flink connector was originally developed by StreamNative and donated to the open-source community. Unlike the poorly maintained Pub/Sub connector, it provides unified batch source and sink capabilities and is actively maintained.
Architecture: The team taps off Discord's validation service into a buffer, performs schema translation with protobuf encoding for fast serialization with strong type safety, and routes events into separate organized Pulsar topics. Safety events are also restreamed into dedicated Pulsar topics for downstream ML consumption.
Technology Stack
Apache Flink was chosen as the stream processing engine for its unified batch and streaming model — the same code runs for both historical backfills and real-time processing. Flink's rich API layers (SQL → Table → DataStream → Process Functions) give engineers flexibility to operate at the right level of abstraction for each task. It's proven at scale and backed by a vibrant open-source community.
"This framework is so powerful that it lets you filter, transform, join, aggregate; you have so much freedom in how you manipulate the data, and it works very efficiently even in real-time. These pipelines can be very simple, something like event ingestion and deduplication; you can do ETL tasks with them. We can do ML on stream data with pretty low latency."
Apache Iceberg handles historical events beyond 30 days. It provides cost-effective cloud storage with streaming-friendly reads that align on event time — crucial for backfilling ML features. Iceberg also enables data deletion for privacy and compliance requirements. Combined with Pulsar via Flink's Hybrid Source, the system seamlessly switches from historical replay to real-time streaming without code changes.
ONNX enables ML model inference directly within the Flink job — no separate inference service required. Engineers develop models in XGBoost, PyTorch, or TensorFlow, convert them to ONNX format, and load them with the ONNX ML inference library inside the Flink job graph. This eliminates the per-model microservice overhead that plagued their previous architecture.
Kappa architecture: The entire system follows a Kappa architecture — one codebase, one pipeline. Engineers can backfill months of historical data in 3–4 hours, train a new model on the backfilled features, and deploy it to production. No separate batch and streaming codepaths to maintain.
Results
The streaming ML platform has delivered measurable impact across Discord's safety operations:
- Double-digit safety metric improvements in both spam detection and account security, directly protecting Discord's 150 million monthly active users.
- Faster iteration: ML engineers can make changes, backfill historical data within 3–4 hours, train a new model, and deploy — a cycle that previously took days or weeks.
- Reduced bug surface: Point-in-time accuracy of all features eliminates the stitching bugs that plagued their previous batch+real-time hybrid approach.
- Simpler infrastructure: ML models run as operators within the Flink job graph instead of requiring separate microservices, dramatically reducing operational overhead.
- Low maintenance: The Flink Kubernetes Operator manages job lifecycle including deployment, save points, and high availability — keeping operational burden minimal for a small team.
"This whole system was built at Discord with a very tiny number of engineers."
"We basically rely on the amazing open-source community that has tried to solve similar problems and donated this stuff. The technology has hit the critical mass where you can build really powerful stuff."
Future Plans
With the platform proven in production for safety use cases, Discord is expanding its application:
- Personalization: Applying the same streaming ML platform to server discovery and notification optimization, where real-time signals can significantly improve user experience.
- Faster experimentation: Enabling rapid A/B testing of new ML models and features, accelerating the team's ability to iterate on both safety and personalization.
- Broader safety coverage: Expanding to additional anti-abuse use cases beyond spam detection and account security.
- Workload migration: Continuing to migrate existing batch and rules-based workloads onto the unified streaming platform, consolidating infrastructure and reducing complexity.

