
TL;DR
Attentive tackled the core problem of achieving mutual exclusivity in their event-driven messaging platform to enhance user experience. They transitioned from Redis-based distributed locks to leveraging Apache Pulsar's subscription modes, specifically FAILOVER and KEY_SHARED, to manage message orchestration at scale. This evolution allowed them to maintain orchestration integrity and scalability during high-demand periods like Black Friday.
Opening
Imagine orchestrating a seamless, real-time messaging experience for millions of users during peak shopping events like Black Friday, where a single lapse can lead to overwhelming users with redundant messages. Attentive, a leading marketing platform, faced this challenge head-on, managing to send 3.9 billion messages over a year, including 620 million on Black Friday alone. By evolving their architecture to ensure mutual exclusivity using Apache Pulsar, they not only preserved user experience but also tackled the intricacies of distributed systems at scale.
What You'll Learn (Key Takeaways)
- Achieving Mutual Exclusivity with Apache Pulsar – Learn how Attentive moved from Redis locks to using Pulsar’s FAILOVER and KEY_SHARED modes to ensure each user receives a coherent and singular message experience.
- Trade-offs and Challenges – Understand the operational hurdles encountered with each Pulsar subscription mode, including head-of-the-line blocking and scalability concerns, and how they were managed.
- Real-world Application and Strategy – Discover practical strategies for managing consumer stalling and optimizing throughput, crucial for maintaining performance during peak loads.
- Observability and Monitoring – Gain insights into the importance of distributed log tracing and bespoke metrics for ensuring system reliability and operational efficiency during architectural transitions.
Q&A Highlights
Q: Can you share a bit more about your future plans for distributed locks?
A: We aim to move towards a more event-loop-based model, focusing on continuous event acknowledgment while handling timeouts, moving away from traditional fork-and-join approaches.
Q: I'm interested in learning more about the scale of key shared subscriptions in your company. Do you have any insights on that?
A: At peak traffic, we manage 1,000 events per second using key shared subscriptions, with horizontal scaling being crucial to mitigate the head-of-the-line problem.
Newsletter
Our strategies and tactics delivered right to your inbox