Goodbye Distributed Locks: Message Orchestration at Scale with Apache Pulsar
Danish Rehman

TL;DR

Attentive tackled the core problem of achieving mutual exclusivity in their event-driven messaging platform to enhance user experience. They transitioned from Redis-based distributed locks to leveraging Apache Pulsar's subscription modes, specifically FAILOVER and KEY_SHARED, to manage message orchestration at scale. This evolution allowed them to maintain orchestration integrity and scalability during high-demand periods like Black Friday.

Opening

Imagine orchestrating a seamless, real-time messaging experience for millions of users during peak shopping events like Black Friday, where a single lapse can lead to overwhelming users with redundant messages. Attentive, a leading marketing platform, faced this challenge head-on, managing to send 3.9 billion messages over a year, including 620 million on Black Friday alone. By evolving their architecture to ensure mutual exclusivity using Apache Pulsar, they not only preserved user experience but also tackled the intricacies of distributed systems at scale.

What You'll Learn (Key Takeaways)

  • Achieving Mutual Exclusivity with Apache Pulsar – Learn how Attentive moved from Redis locks to using Pulsar’s FAILOVER and KEY_SHARED modes to ensure each user receives a coherent and singular message experience.
  • Trade-offs and Challenges – Understand the operational hurdles encountered with each Pulsar subscription mode, including head-of-the-line blocking and scalability concerns, and how they were managed.
  • Real-world Application and Strategy – Discover practical strategies for managing consumer stalling and optimizing throughput, crucial for maintaining performance during peak loads.
  • Observability and Monitoring – Gain insights into the importance of distributed log tracing and bespoke metrics for ensuring system reliability and operational efficiency during architectural transitions.

Q&A Highlights

Q: Can you share a bit more about your future plans for distributed locks?
A: We aim to move towards a more event-loop-based model, focusing on continuous event acknowledgment while handling timeouts, moving away from traditional fork-and-join approaches.

Q: I'm interested in learning more about the scale of key shared subscriptions in your company. Do you have any insights on that?
A: At peak traffic, we manage 1,000 events per second using key shared subscriptions, with horizontal scaling being crucial to mitigate the head-of-the-line problem.

Danish Rehman
Staff Software Engineer, Attentive

Danish is a Staff Software Engineer at Attentive, where he plays a key role in shaping the architecture and scalability of our product. With nearly 15 years of experience in building distributed systems for high traffic ad-tech and e-commerce, he has been instrumental in developing our next-generation marketing solutions powered by AI. Outside of work, Danish enjoys gardening, vermiculture, and rock climbing. He loves sharing his harvest with his four-year old son before they head out on climbing adventures together.‍

Newsletter

Our strategies and tactics delivered right to your inbox

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.