At Flipkart, there are multiple use-cases for high throughput messaging like streaming/batch pipelines, ordered processing, auditing, etc. Most of the teams manage and deploy their own messaging backends which are primarily Kafka clusters. We identified that offering topic-as-a-service can take away operational complexity for these teams and help us enforce stricter SLAs around uptime and geo-replication. This session will talk about our approach towards building a scalable and multi-tenant platform with Pulsar as the choice of backend.
We maintain our own control plane which offers simpler contracts for onboarding a new user rather than bogging them down with cluster management aspects. The talk will cover some of the design decisions that have been made when offering this as a service in our private cloud and also some of the technical aspects like:
How we ensure capacity for new tenants in a shared cluster with reasonable guarantees.
Pulsar offers different kinds of isolation mechanisms: cluster peering, isolation groups, produce/dispatch quotas, etc.
We will explain our philosophy when configuring these. Operational Readiness:
Approach to redline testing, ensuring zero data-loss and ensuring uptime. Oauth2 and in-house RBAC integrations to tailor permission model that works for topic-as-a-service.
How we provide Active-Passive topics on top of Pulsar (as compared to active-active that comes out of the box).
The talk will briefly touch upon the challenges we face when driving adoption and convincing kafka users to move to this platform. We will wrap up the talk by outlining some of the things that we intend to improve upon in the long-term.