[{"data":1,"prerenderedAt":1392},["ShallowReactive",2],{"active-banner":3,"navbar-featured-partner-blog":24,"navbar-pricing-featured":306,"blog-\u002Fblog\u002Fpulsar-newbie-guide-for-kafka-engineers-part-4-subscriptions-consumers":1086,"blog-authors-\u002Fblog\u002Fpulsar-newbie-guide-for-kafka-engineers-part-4-subscriptions-consumers":1325,"related-\u002Fblog\u002Fpulsar-newbie-guide-for-kafka-engineers-part-4-subscriptions-consumers":1373},{"id":4,"title":5,"date":6,"dismissible":7,"extension":8,"link":9,"link2":10,"linkText":11,"linkText2":12,"meta":13,"stem":21,"variant":22,"__hash__":23},"banners\u002Fbanners\u002Flakestream-ufk-launch.md","StreamNative Introduces Lakestream Architecture and Launches Native Kafka Service","2026-04-07",true,"md","\u002Fblog\u002Ffrom-streams-to-lakestreams","https:\u002F\u002Fconsole.streamnative.cloud\u002Fsignup?from=banner_lakestream-launch","Read Announcement","Sign Up Now",{"body":14},{"type":15,"value":16,"toc":17},"minimark",[],{"title":18,"searchDepth":19,"depth":19,"links":20},"",2,[],"banners\u002Flakestream-ufk-launch","default","zRueBGutATZB0ZnFFHwaEV7F0Di4tnZUHhgOiI4cu6k",{"id":25,"title":26,"authors":27,"body":29,"canonicalUrl":289,"category":290,"createdAt":289,"date":291,"description":292,"extension":8,"featured":7,"image":293,"isDraft":294,"link":289,"meta":295,"navigation":7,"order":296,"path":297,"readingTime":298,"relatedResources":289,"seo":299,"stem":300,"tags":301,"__hash__":305},"blogs\u002Fblog\u002Fstreamnative-recognized-in-the-forrester-wave-streaming-data-platforms-2025.md","StreamNative Recognized as a Contender in The Forrester Wave™: Streaming Data Platforms, Q4 2025",[28],"David Kjerrumgaard",{"type":15,"value":30,"toc":276},[31,39,47,51,67,73,78,81,87,102,109,115,118,124,127,134,140,143,146,157,163,169,172,175,178,184,191,194,197,204,207,210,224,229,233,237,241,245,249,251,268,270],[32,33,35],"h3",{"id":34},"receives-highest-possible-scores-in-both-the-messaging-and-resource-optimization-criteria",[36,37,38],"em",{},"Receives Highest Possible Scores in BOTH the Messaging and Resource Optimization Criteria",[40,41,43],"h2",{"id":42},"introduction",[44,45,46],"strong",{},"Introduction",[48,49,50],"p",{},"Real-time data has become the backbone of modern innovation. As artificial intelligence (AI) and digital services demand instantaneous insights, organizations are realizing that streaming data is no longer optional – it's essential for delivering timely, context-rich experiences. StreamNative's data streaming platform is built precisely for this reality, ensuring data is immediate, reliable, and ready to power critical applications.",[48,52,53,54,63,64],{},"Today, we're excited to announce that Forrester Research has named StreamNative as a Contender in its evaluation, ",[55,56,58],"a",{"href":57},"\u002Freports\u002Frecognized-in-the-forrester-wave-tm-streaming-data-platforms-q4-2025",[36,59,60],{},[44,61,62],{},"The Forrester Wave™: Streaming Data Platforms, Q4 2025",". This report evaluated 15 top streaming data platform providers, and we're proud to share that ",[44,65,66],{},"StreamNative received the highest scores possible—5 out of 5—in both the Messaging and Resource Optimization criteria.",[48,68,69,70],{},"***Forrester's Take: ***",[36,71,72],{},"\"StreamNative is a good fit for enterprises that want an Apache Pulsar implementation that is also compatible with Kafka APIs.\"",[48,74,75],{},[36,76,77],{},"— The Forrester Wave™: Streaming Data Platforms, Q4 2025",[48,79,80],{},"Being recognized in the Forrester Wave is a proud milestone, and for us, it highlights how far StreamNative has come in enabling enterprises to unlock the power of real-time data. In the sections below, we'll dive into what we believe sets StreamNative apart—from our modern architecture and cloud-native design to our open-source foundation and real-time use cases—and how we see these strengths aligning with Forrester's findings.",[40,82,84],{"id":83},"trusted-by-industry-leaders",[44,85,86],{},"Trusted by Industry Leaders",[48,88,89,90,93,94,97,98,101],{},"Companies across industries are already leveraging StreamNative to drive real-time outcomes. Global enterprises like ",[44,91,92],{},"Cisco"," rely on StreamNative to handle massive IoT telemetry, supporting 245 million+ connected devices. Martech leaders such as ",[44,95,96],{},"Iterable"," process billions of events per day with StreamNative for hyper-personalized customer engagement. And in financial services, ",[44,99,100],{},"FICO"," trusts StreamNative to power its real-time fraud detection and analytics pipelines with a secure, scalable streaming backbone.",[48,103,104,105,108],{},"The Forrester report notes that, “",[36,106,107],{},"Customers appreciate the lower infrastructure costs that result from StreamNative’s cost-efficient, Kafka-compatible architecture. Customers note excellent support responsiveness…","”",[40,110,112],{"id":111},"modern-cloud-native-architecture-built-for-scale",[44,113,114],{},"Modern, Cloud-Native Architecture Built for Scale",[48,116,117],{},"From day one, StreamNative was designed with a modern architecture to meet the demanding scale and flexibility requirements of real-time data. Unlike legacy streaming systems that often rely on tightly coupled storage and compute, StreamNative's platform takes a cloud-native approach: it decouples these layers to enable elastic scalability and efficient resource utilization across any environment. The core is powered by Apache Pulsar—a distributed messaging and streaming engine—enhanced with multi-protocol support (including native Apache Kafka API compatibility) to unify diverse data streams under one roof. This means organizations can consolidate siloed messaging systems and handle both high-volume event streams and traditional message queues on a single platform, without sacrificing performance or reliability.",[48,119,120,121,108],{},"Forrester's evaluation described that “",[36,122,123],{},"StreamNative aims to provide a high-performance, multi-protocol streaming data platform: It uses Apache Pulsar with Kafka API compatibility to deliver cost-efficient, real-time applications for enterprises. It appeals to organizations that want a flexible, low-cost streaming solution, due to its focus on scalability and resource optimization, while its investments in Pulsar’s open-source ecosystem and performance optimization make it the primary platform for enterprises wishing to implement Pulsar.",[48,125,126],{},"Our cloud-first, leaderless architecture (with no single broker bottlenecks) and tiered storage model were built to maximize throughput and cost-efficiency for real-time workloads. By separating compute from storage and leveraging distributed object storage, StreamNative can retain huge volumes of event data indefinitely while keeping compute costs in check—effectively providing a flexible, low-cost streaming solution.",[48,128,129,130,133],{},"This modern design not only delivers high performance, but also ensures fault tolerance and geo-distribution out of the box, so enterprises can trust their streaming data is always available and durable. As Forrester’s evaluation noted, StreamNative ",[36,131,132],{},"\"excels at messaging and resource optimization\" and “Its platform supports use cases like real-time analytics and event-driven architectures with robust scalability.","” Our architecture provides the strong foundation that today's real-time applications demand, from ultra-fast data ingestion to seamless scale-out across hybrid and multi-cloud environments.",[40,135,137],{"id":136},"open-source-foundation-and-pulsar-expertise",[44,138,139],{},"Open Source Foundation and Pulsar Expertise",[48,141,142],{},"StreamNative's DNA is rooted in open source innovation. Our founders are the original creators of Apache Pulsar, and we've built our platform with the same open principles: freedom, flexibility, and community-driven innovation. For developers and data teams, this means adopting StreamNative comes with no proprietary lock-in—instead, you get a platform built on open standards and a thriving ecosystem. We offer broad API compatibility (Pulsar, Kafka, JMS, MQTT, and more) so that teams can work with familiar interfaces and integrate StreamNative into existing systems with ease.",[48,144,145],{},"StreamNative is the primary commercial contributor to the Apache Pulsar project and its surrounding ecosystem. We invest heavily in Pulsar's ongoing improvements our investments in Pulsar's open-source ecosystem and performance optimization bolster StreamNative's value. We also foster a vibrant community through initiatives like the Data Streaming Summit and free training resources.",[48,147,148,149,152,153,156],{},"Forrester's assessment noted that StreamNative’s “",[36,150,151],{},"events-driven agents, extensibility, and performance architecture are solid,","” and we're continuing to build on that foundation. ",[44,154,155],{},"We're actively investing in expanding our tooling for observability, governance, schema management, and developer productivity","—areas we recognize as critical for enterprise adoption and where we're committed to accelerating our roadmap.",[48,158,159,160],{},"Being open also means embracing an open ecosystem of technologies. StreamNative actively integrates with the tools and platforms that matter most to our users. We partner with industry leaders like Snowflake, Databricks, Google, and Ververica to ensure our streaming platform works seamlessly with data warehouses, lakehouse storage, and stream processing frameworks. Forrester’s evaluation observed that StreamNative’s ",[36,161,162],{},"\"investments in Pulsar’s open-source ecosystem and performance optimization make it the primary platform for enterprises wishing to implement Pulsar.\"",[40,164,166],{"id":165},"powering-real-time-use-cases-across-industries",[44,167,168],{},"Powering Real-Time Use Cases Across Industries",[48,170,171],{},"One of the greatest validations of StreamNative's approach is the success our customers are achieving with real-time data. StreamNative's platform is versatile and use-case agnostic—if an application demands high-volume, low-latency data movement, we can power it. This flexibility is why our customer base spans industries from finance and IoT to major automobile manufacturers and online gaming. The common thread is that these organizations need to process and react to data in milliseconds, and StreamNative is delivering the capabilities to make that possible.",[48,173,174],{},"Cisco uses StreamNative to underpin an IoT telemetry system of colossal scale, connecting hundreds of millions of devices and thousands of enterprise clients with real-time data streams. The platform's multi-tenant design and proven reliability allow Cisco to offer its customers a live feed of device data with unwavering confidence. In the financial sector, FICO has built streaming pipelines on StreamNative to detect fraud as transactions happen and to monitor systems in real time. With StreamNative's strong guarantees around message durability and ordering, FICO can catch anomalies or suspicious patterns within seconds. And in digital customer engagement, Iterable relies on StreamNative to process billions of events every day—clicks, views, purchases—so that marketers can trigger personalized campaigns instantly based on user behavior.",[48,176,177],{},"Our customers uniformly deal with mission-critical data streams, where downtime or delays are unacceptable. StreamNative's fault-tolerant, scalable infrastructure has proven equal to the task, handling scenarios like bursting to millions of events per second or seamlessly spanning multiple cloud regions. Forrester's report recognized StreamNative for supporting event-driven architectures with robust scalability—which for us is a reflection of our platform's ability to meet the most demanding enterprise requirements.",[40,179,181],{"id":180},"continuing-to-innovate-ursa-orca-and-the-road-ahead",[44,182,183],{},"Continuing to Innovate: Ursa, Orca, and the Road Ahead",[48,185,186,187,190],{},"While we are thrilled to be recognized in Forrester's Streaming Data Platforms Wave, we view this as just the beginning. StreamNative's vision has always been bold: to ",[44,188,189],{},"provide a unified platform that not only handles today's streaming needs but also anticipates the emerging requirements of tomorrow",".",[48,192,193],{},"One key area of focus is the convergence of streaming data with advanced analytics and AI. As Forrester points out in the report, technology leaders should look for platforms that natively integrate messaging, stream processing, and analytics to provide AI agents with real-time, contextualized information. We couldn't agree more. Our award-winning Ursa Engine and Orca Agent Engine are aimed at extending our platform up the stack—bridging the gap between data streams and data lakes, and between event streams and intelligent processing.",[48,195,196],{},"Our new Ursa Engine introduces a lakehouse-native approach to streaming: it can write events directly to table formats like Iceberg on cloud storage, eliminating entire classes of ETL jobs and making fresh data instantly available for analytics queries. By integrating streaming and lakehouse technologies, we help customers collapse data silos and accelerate their AI\u002FML pipelines.",[48,198,199,200,203],{},"Beyond analytics integration, we are also enhancing StreamNative with more out-of-the-box processing and governance capabilities. In the coming months, we plan to introduce new features for lightweight stream processing and transformation, making it easier to build reactive applications directly on the platform. We're also expanding our ecosystem of connectors and integrations, so that whether your data lands in Snowflake, Databricks, or an AI model, StreamNative will seamlessly feed it. ",[44,201,202],{},"We're investing significantly in enterprise features including security, schema registry, governance, and monitoring tooling","—capabilities that are essential for mission-critical deployments and where we're committed to continued improvement.",[48,205,206],{},"This recognition from Forrester energizes us to keep innovating at full speed. We're sharing this honor with our amazing customers, community, and partners who drive us forward every day. Your feedback and real-world challenges have helped shape StreamNative into what it is today, and together, we will shape the future of streaming data. Thank you for joining us on this journey—we're just getting started, and we can't wait to deliver even more value as we continue to evolve our platform. Onward to real-time everything!",[208,209],"hr",{},[32,211,213],{"id":212},"streamnative-in-the-forrester-wave-evaluation-findings",[44,214,215,216,223],{},"StreamNative in ",[44,217,218],{},[55,219,220],{"href":57},[44,221,222],{},"The Forrester Wave™",": Evaluation Findings",[225,226,228],"h5",{"id":227},"recognized-as-a-contender-among-15-streaming-data-platform-providers","• Recognized as a Contender among 15 streaming data platform providers",[225,230,232],{"id":231},"received-the-highest-scores-possible-50-in-both-the-messaging-and-resource-optimization-criteria","* Received the highest scores possible (5.0) in both the Messaging and Resource Optimization criteria",[225,234,236],{"id":235},"cited-as-the-primary-platform-for-enterprises-wishing-to-implement-pulsar","• Cited as the primary platform for enterprises wishing to implement Pulsar",[225,238,240],{"id":239},"noted-for-excelling-at-messaging-and-resource-optimization","• Noted for excelling at messaging and resource optimization",[225,242,244],{"id":243},"customers-cited-lower-infrastructure-costs-and-excellent-support-responsiveness","• Customers cited lower infrastructure costs and excellent support responsiveness",[225,246,248],{"id":247},"recognized-for-supporting-event-driven-architectures-with-robust-scalability","• Recognized for supporting event-driven architectures with robust scalability",[208,250],{},[252,253,255,256,259,260,190],"h6",{"id":254},"forrester-disclaimer-forrester-does-not-endorse-any-company-product-brand-or-service-included-in-its-research-publications-and-does-not-advise-any-person-to-select-the-products-or-services-of-any-company-or-brand-based-on-the-ratings-included-in-such-publications-information-is-based-on-the-best-available-resources-opinions-reflect-judgment-at-the-time-and-are-subject-to-change-for-more-information-read-about-forresters-objectivity-here","**Forrester Disclaimer: **",[36,257,258],{},"Forrester does not endorse any company, product, brand, or service included in its research publications and does not advise any person to select the products or services of any company or brand based on the ratings included in such publications. Information is based on the best available resources. Opinions reflect judgment at the time and are subject to change",". *For more information, read about Forrester’s objectivity *",[55,261,265],{"href":262,"rel":263},"https:\u002F\u002Fwww.forrester.com\u002Fabout-us\u002Fobjectivity\u002F",[264],"nofollow",[36,266,267],{},"here",[208,269],{},[252,271,273],{"id":272},"apache-apache-pulsar-apache-kafka-apache-flink-and-other-names-are-trademarks-of-the-apache-software-foundation-no-endorsement-by-apache-or-other-third-parties-is-implied",[36,274,275],{},"Apache®, Apache Pulsar®, Apache Kafka®, Apache Flink® and other names are trademarks of The Apache Software Foundation. No endorsement by Apache or other third parties is implied.",{"title":18,"searchDepth":19,"depth":19,"links":277},[278,280,281,282,283,284,285],{"id":34,"depth":279,"text":38},3,{"id":42,"depth":19,"text":46},{"id":83,"depth":19,"text":86},{"id":111,"depth":19,"text":114},{"id":136,"depth":19,"text":139},{"id":165,"depth":19,"text":168},{"id":180,"depth":19,"text":183,"children":286},[287],{"id":212,"depth":279,"text":288},"StreamNative in The Forrester Wave™: Evaluation Findings",null,"Company","2025-12-16","StreamNative is recognized in The Forrester Wave™: Streaming Data Platforms, Q4 2025. Discover why Forrester highlights StreamNative's high-performance messaging, efficient resource use, and cost-effective Kafka API compatibility for real-time innovation.","\u002Fimgs\u002Fblogs\u002F693bd36cf01b217dcb67278f_Streamnative_blog_thumbnail.png",false,{},0,"\u002Fblog\u002Fstreamnative-recognized-in-the-forrester-wave-streaming-data-platforms-2025","10 mins read",{"title":26,"description":292},"blog\u002Fstreamnative-recognized-in-the-forrester-wave-streaming-data-platforms-2025",[302,303,304],"Announcements","Real-Time","Forrester","5Nr1vAcqlQ7yFQfdL0a3MLsNFerVmEOQJXD9Twz5lx8",{"id":307,"title":308,"authors":309,"body":314,"canonicalUrl":289,"category":1073,"createdAt":289,"date":1074,"description":1075,"extension":8,"featured":7,"image":1076,"isDraft":294,"link":289,"meta":1077,"navigation":7,"order":296,"path":1078,"readingTime":1079,"relatedResources":289,"seo":1080,"stem":1081,"tags":1082,"__hash__":1085},"blogs\u002Fblog\u002Fhow-we-run-a-5-gb-s-kafka-workload-for-just-50-per-hour.md","How We Run a 5 GB\u002Fs Kafka Workload for Just $50 per Hour",[310,311,312,313],"Matteo Meril","Neng Lu","Hang Chen","Penghui Li",{"type":15,"value":315,"toc":1043},[316,319,322,325,328,331,335,338,348,354,357,365,370,374,381,384,387,395,399,402,407,411,414,417,420,423,432,436,439,450,453,457,460,463,474,477,481,485,493,496,500,508,537,541,544,549,553,556,560,563,566,571,580,585,588,591,602,606,609,620,624,627,630,635,638,667,671,673,679,682,687,692,695,699,713,717,728,732,747,756,767,770,773,777,780,783,794,797,800,803,808,813,817,821,838,842,856,861,865,876,879,895,899,910,915,920,928,932,935,939,946,950,953,962,967,976,982,991,1000,1009,1018,1027,1035],[48,317,318],{},"The rise of DeepSeek has shaken the AI infrastructure market, forcing companies to confront the escalating costs of training and deploying AI models. But the real pressure point isn’t just compute—it’s data acquisition and ingestion costs.",[48,320,321],{},"As businesses rethink their AI cost-containment strategies, real-time data streaming is emerging as a critical enabler. The growing adoption of Kafka as a standard protocol has expanded cost-efficient options, allowing companies to optimize streaming analytics while keeping expenses in check.",[48,323,324],{},"Ursa, the data streaming engine powering StreamNative’s managed Kafka service, is built for this new reality. With its leaderless architecture and native lakehouse storage integration, Ursa eliminates costly inter-zone network traffic for data replication and client-to-broker communication while ensuring high availability at minimal operational cost.",[48,326,327],{},"In this blog post, we benchmarked the infrastructure cost and total cost of ownership (TCO) for running a 5GB\u002Fs Kafka workload across different Kafka vendors, including Redpanda, Confluent WarpStream, and AWS MSK. Our benchmark results show that Ursa can sustain 5GB\u002Fs Kafka workloads at just 5% of the cost of traditional streaming engines like Redpanda—making it the ideal solution for high-performance, cost-efficient ingestion and data streaming for data lakehouses and AI workloads.",[48,329,330],{},"Note: We also evaluated vanilla Kafka in our benchmark; however, for simplicity, we have focused our cost comparison on vendor solutions rather than self-managed deployments. That said, it is important to highlight that both Redpanda and vanilla Kafka use a leader-based data replication approach. In a data-intensive, network-bound workload like 5GB\u002Fs streaming, with the same machine type and replication factor, Redpanda and vanilla Kafka produced nearly identical cost profiles.",[40,332,334],{"id":333},"key-benchmark-findings","Key Benchmark Findings",[48,336,337],{},"Ursa delivered 5 GB\u002Fs of sustained throughput at an infrastructure cost of just $54 per hour. For comparison:",[339,340,341,345],"ul",{},[342,343,344],"li",{},"MSK: $303 per hour → 5.6x more expensive compared to Ursa",[342,346,347],{},"Redpanda: $988 per hour → 18x more expensive compared to Ursa",[48,349,350],{},[351,352],"img",{"alt":18,"src":353},"\u002Fimgs\u002Fblogs\u002F679c71b67d9046f26edc7977_AD_4nXfvTqyBNUBu2lObdkKAx-5UNkpNP8UYULLZyOcixE6z99VMZUUEsUqWjzexI7vjyNGRNSAUoM9smYvdTP55ctAhIbrs5lmQgcSVMWdaoigbWouCl95DVSQsxooY-qqfGcYqS4g4zA.png",[48,355,356],{},"Beyond infrastructure costs, when factoring in both storage pricing, vendor pricing and operational expenses, Ursa’s total cost of ownership (TCO) for a 5GB\u002Fs workload with a 7-day retention period is:",[339,358,359,362],{},[342,360,361],{},"50% cheaper than Confluent WarpStream",[342,363,364],{},"85% cheaper than MSK and Redpanda",[48,366,367],{},[351,368],{"alt":18,"src":369},"\u002Fimgs\u002Fblogs\u002F679c602d77e9c706de5343b8_AD_4nXeDv8rrv_C1CTCCiqYo1zpvlGYbdBk1r0VEqovAPu22iFMQZgh54Hfw9PBMLzM7jDFxKwAFDxbdG0np4XVk_tGsWhEKMloLRcmmea7lvueCx-0cFsyaE3Mya4Mxc1Dox95A6JEc.png",[40,371,373],{"id":372},"ursa-highly-cost-efficient-data-streaming-at-scale","Ursa: Highly Cost-Efficient Data Streaming at Scale",[48,375,376,380],{},[55,377,379],{"href":378},"\u002Fblog\u002Fursa-reimagine-apache-kafka-for-the-cost-conscious-data-streaming","Ursa"," is a next-generation data streaming engine designed to deliver high performance at a fraction of the cost of traditional disk-based solutions. It is fully compatible with Apache Kafka and Apache Pulsar APIs, while leveraging a leaderless, lakehouse-native architecture to maximize scalability, efficiency, and cost savings.",[48,382,383],{},"Ursa’s key innovation is separating storage from compute and decoupling metadata\u002Findex operations from data operations by utilizing cloud object storage (e.g., AWS S3) instead of costly inter-zone disk-based replication. It also employs open lakehouse formats (Iceberg and Delta Lake), enabling columnar compression to significantly reduce storage costs while maintaining durability and availability.",[48,385,386],{},"In contrast, traditional streaming systems—like Kafka and Redpanda—depend on leader-based architectures, which drive up inter-zone traffic costs due to replication and client communication. Ursa mitigates these costs by:",[339,388,389,392],{},[342,390,391],{},"Eliminating inter-zone traffic costs via a leaderless architecture.",[342,393,394],{},"Replacing costly inter-zone replication with direct writes to cloud storage using open lakehouse formats.",[40,396,398],{"id":397},"how-ursa-eliminates-inter-zone-traffic","How Ursa Eliminates Inter-Zone Traffic",[48,400,401],{},"Ursa minimizes inter-zone traffic by leveraging a leaderless architecture, which eliminates inter-zone communication between clients and brokers, and lakehouse-native storage, which removes the need for inter-zone data replication. This approach ensures high availability and scalability while avoiding unnecessary cross-zone data movement.",[48,403,404],{},[351,405],{"alt":18,"src":406},"\u002Fimgs\u002Fblogs\u002F679c602e21b3571bb7117dca_AD_4nXd7Oahc77NjRLNvA9clLt0tsyU6MrIqVibFYv5pW5giTIcCHPr3EA_yTGzfVEUIVO3VXK56qWK8zmBCp5lY0E_4nmlWIPFrHjtHylA5NhwELjn-UB0fLG2h_kbrxrc7Cs_edvveNA.png",[32,408,410],{"id":409},"leaderless-architecture","Leaderless architecture",[48,412,413],{},"Traditional streaming engines such as Kafka, Pulsar, or RedPanda rely on a leader-based model, where each partition is assigned to a single leader broker that handles all writes and reads.",[48,415,416],{},"Pros of Leader-Based Architectures:\n✔ Maintains message ordering via local sequence IDs\n✔ Delivers low latency and high performance through message caching",[48,418,419],{},"Cons of Leader-Based Architectures:\n✖ Throughput bottlenecked by a single broker per partition\n✖ Inter-zone traffic required for high availability in multi-AZ deployments",[48,421,422],{},"While Kafka and Pulsar offer partial solutions (e.g., reading from followers, shadow topics) to reduce read-related inter-zone traffic, producers still send data to a single leader.",[48,424,425,426,431],{},"Ursa removes the concept of topic ownership, allowing any broker in the cluster to handle reads or writes for any partition. The primary challenge—ensuring message ordering—is solved with ",[55,427,430],{"href":428,"rel":429},"https:\u002F\u002Fgithub.com\u002Fstreamnative\u002Foxia",[264],"Oxia",", a scalable metadata and index service created by StreamNative in 2022.",[32,433,435],{"id":434},"oxia-the-metadata-layer-enabling-leaderless-architecture","Oxia: The Metadata Layer Enabling Leaderless Architecture",[48,437,438],{},"Ensuring message ordering in a leaderless architecture is complex, but Ursa solves this with Oxia:",[339,440,441,444,447],{},[342,442,443],{},"Handles millions of metadata\u002Findex operations per second",[342,445,446],{},"Generates sequential IDs to maintain strict message ordering",[342,448,449],{},"Optimized for Kubernetes with horizontal scalability",[48,451,452],{},"Producers and consumers can connect to any broker within their local AZ, eliminating inter-zone traffic costs while maintaining performance through localized caching.",[32,454,456],{"id":455},"zero-interzone-data-replication","Zero interzone data replication",[48,458,459],{},"In most distributed systems, data replication from a leader (primary) to followers (replicas) is crucial for fault tolerance and availability. However, replication across zones can inflate infrastructure expenses substantially.",[48,461,462],{},"Ursa avoids these costs by writing data directly to cloud storage (e.g., AWS S3, Google GCS):",[339,464,465,468,471],{},[342,466,467],{},"Built-In Resilience: Cloud storage inherently offers high availability and fault tolerance without inter-zone traffic fees.",[342,469,470],{},"Tradeoff: Slightly higher latency (sub-second, with p99 at 500 milliseconds) compared to local disk\u002FEBS (single-digit to sub-100 milliseconds), in exchange for significantly lower costs (up to 10x lower).",[342,472,473],{},"Flexible Modes: Ursa is an addition to the classic BookKeeper-based engine, providing users with the flexibility to optimize for either cost or low latency based on their workload requirements.",[48,475,476],{},"By foregoing conventional replication, Ursa slashes inter-zone traffic costs and associated complexities—making it a compelling option for organizations seeking to balance high-performance data streaming with strict budget constraints.",[40,478,480],{"id":479},"how-we-ran-a-5-gbs-test-with-ursa","How We Ran a 5 GB\u002Fs Test with Ursa",[32,482,484],{"id":483},"ursa-cluster-deployment","Ursa Cluster Deployment",[339,486,487,490],{},[342,488,489],{},"9 brokers across 3 availability zones, each on m6i.8xlarge (Fixed 12.5 Gbps bandwidth, 32 vCPU cores, 128 GB memory).",[342,491,492],{},"Oxia cluster (metadata store) with 3 nodes of m6i.8xlarge, distributed across three availability zones (AZs).",[48,494,495],{},"During peak throughput (5 GB\u002Fs), each broker’s network usage was about 10 Gbps.",[32,497,499],{"id":498},"openmessaging-benchmark-workers-configuration","OpenMessaging Benchmark Workers & Configuration",[48,501,502,503,507],{},"The OpenMessaging Benchmark(OMB) Framework is a suite of tools that make it easy to benchmark distributed messaging systems in the cloud. Please check ",[55,504,505],{"href":505,"rel":506},"https:\u002F\u002Fopenmessaging.cloud\u002Fdocs\u002Fbenchmarks\u002F",[264]," for details.",[339,509,510,525,534],{},[342,511,512,513,518,519,524],{},"12 OMB workers: 6 for ",[55,514,517],{"href":515,"rel":516},"https:\u002F\u002Fgist.github.com\u002Fcodelipenghui\u002Fd1094122270775e4f1580947f80c5055",[264],"producers",", 6 for ",[55,520,523],{"href":521,"rel":522},"https:\u002F\u002Fgist.github.com\u002Fcodelipenghui\u002F06bada89381fb77a7862e1b4c1d8963d",[264],"consumers"," across 3 availability zones, on m6i.8xlarge instances. Each worker is configured with 12 CPU cores and 48 GB memory.",[342,526,527,528,533],{},"Sample YAML ",[55,529,532],{"href":530,"rel":531},"https:\u002F\u002Fgist.github.com\u002Fcodelipenghui\u002F204c1f26c4d44a218ae235bf2de99904",[264],"scripts"," provided for Kafka-compatible configuration and rate limits.",[342,535,536],{},"Achieved consistent 5 GB\u002Fs publish\u002Fsubscribe throughput.",[40,538,540],{"id":539},"ursa-benchmark-tests-results","Ursa Benchmark Tests & Results",[48,542,543],{},"The following diagram demonstrates that Ursa can consistently handle 5 GB\u002Fs of traffic, fully saturating the network across all broker nodes.",[48,545,546],{},[351,547],{"alt":18,"src":548},"\u002Fimgs\u002Fblogs\u002F679c602d7b261bac1113f7d6_AD_4nXdDPsRc3koXICiFF0bqSmGWbJt_RlUy4FE3ruuWOfbCfpcqZ1dejjqGbkaCJv2hQFL1nirRouBVRW2l5uMWBvY9naMqGB_wHcLI14dBM0f85TXhmdm3UxEv1yGX9Y4hf5FttSkZew.png",[40,550,552],{"id":551},"comparing-infrastructure-cost","Comparing Infrastructure Cost",[48,554,555],{},"This benchmark first evaluates infrastructure costs of running a 5 GB\u002Fs streaming workload (1:1 producer-to-consumer ratio) across different data streaming engines, including Ursa, Redpanda, and AWS MSK, with a focus on multi-AZ deployments to ensure a fair comparison.",[32,557,559],{"id":558},"test-setup-key-assumptions","Test Setup & Key Assumptions",[48,561,562],{},"All tests use multi-AZ configurations, with clusters and clients distributed across three AWS availability zones (AZs). Cluster size scales proportionally to the number of AZs, and rack-awareness is enabled for all engines to evenly distribute topic partitions and leaders.",[48,564,565],{},"To ensure a fair comparison, we selected the same machine type capable of fully utilizing both network and storage bandwidth for Ursa and Redpanda in this 5GB\u002Fs test:",[339,567,568],{},[342,569,570],{},"9 × m6i.8xlarge instances",[48,572,573,574,579],{},"However, MSK's storage bandwidth limits vary depending on the selected instance type, with the highest allowed limit capped at 1000 MiB\u002Fs per broker, according to",[55,575,578],{"href":576,"rel":577},"https:\u002F\u002Fdocs.aws.amazon.com\u002Fmsk\u002Flatest\u002Fdeveloperguide\u002Fmsk-provision-throughput-management.html#throughput-bottlenecks",[264]," AWS documentation",". Given this constraint, achieving 5 GB\u002Fs throughput with a replication factor of 3 required the following setup:",[339,581,582],{},[342,583,584],{},"15 × kafka.m7g.8xlarge (32 vCPUs, 128 GB memory, 15 Gbps network, 4000 GiB EBS).",[48,586,587],{},"This configuration was necessary to work around MSK's storage bandwidth limitations, ensuring a comparable cost basis to other evaluated streaming engines.",[48,589,590],{},"Additional key assumptions include:",[339,592,593,596,599],{},[342,594,595],{},"Inter-AZ producer traffic: For leader-based engines, two-thirds of producer-to-broker traffic crosses AZs due to leader distribution.",[342,597,598],{},"Consumer optimizations: Follower fetch is enabled across all tests, eliminating inter-AZ consumer traffic.",[342,600,601],{},"Storage cost exclusions: This benchmark only evaluates streaming costs, assuming no long-term data retention.",[32,603,605],{"id":604},"inter-broker-replication-costs","Inter-Broker Replication Costs",[48,607,608],{},"Inter-broker (cross-AZ) replication is a major cost driver for data streaming engines:",[339,610,611,614,617],{},[342,612,613],{},"RedPanda: Inter-broker replication is not free, leading to substantial costs when data must be copied across multiple availability zones.",[342,615,616],{},"AWS MSK: Inter-broker replication is free, but MSK instance pricing is significantly higher (e.g., $3.264 per hour for kafka.m7g.8xlarge vs $1.306 per hour for an on-demand m7g.8xlarge). The storage price of MSK is $0.10 per GB-month which is significantly higher than st1, which costs $0.045 per GB-month. Even though replication is free, client-to-broker traffic still incurs inter-AZ charges.",[342,618,619],{},"Ursa: No inter-broker replication costs due to its leaderless architecture, eliminating inter-zone replication costs entirely.",[32,621,623],{"id":622},"zone-affinity-reducing-inter-az-costs","Zone Affinity: Reducing Inter-AZ Costs",[48,625,626],{},"We evaluated zone affinity mechanisms to further reduce inter-AZ data transfer costs.",[48,628,629],{},"Consumers:",[339,631,632],{},[342,633,634],{},"Follower fetch is enabled across all tests, ensuring consumers fetch data from replicas in their local AZ—eliminating inter-zone consumer traffic except for metadata lookups",[48,636,637],{},"Producers:",[339,639,640,649,658],{},[342,641,642,643,648],{},"Kafka protocol lacks an easy way to enforce producer AZ affinity (though ",[55,644,647],{"href":645,"rel":646},"https:\u002F\u002Fcwiki.apache.org\u002Fconfluence\u002Fdisplay\u002FKAFKA\u002FKIP-1123:+Rack-aware+partitioning+for+Kafka+Producer",[264],"KIP-1123"," aims to address this). And it only works with the default partitioner (i.e., when no record partition or record key is specified).",[342,650,651,652,657],{},"Redpanda recently introduced ",[55,653,656],{"href":654,"rel":655},"https:\u002F\u002Fdocs.redpanda.com\u002Fredpanda-cloud\u002Fdevelop\u002Fproduce-data\u002Fleader-pinning\u002F",[264],"leader pinning",", but this only benefits setups where producers are confined to a single AZ—not applicable to our multi-AZ benchmark.",[342,659,660,661,666],{},"Ursa is the only system in this test with ",[55,662,665],{"href":663,"rel":664},"https:\u002F\u002Fdocs.streamnative.io\u002Fdocs\u002Fconfig-kafka-client#eliminate-cross-az-networking-traffic",[264],"built-in zone affinity for both producers and consumers",". It achieves this by embedding producer AZ information in client.id, allowing metadata lookups to route clients to local-AZ brokers, eliminating inter-AZ producer traffic.",[32,668,670],{"id":669},"cost-comparison-results","Cost Comparison Results",[48,672,337],{},[339,674,675,677],{},[342,676,344],{},[342,678,347],{},[48,680,681],{},"Ursa’s leaderless architecture, zone affinity, and native cloud storage integration deliver unparalleled cost efficiency, making it the most cost-effective choice for high-throughput data streaming workloads.",[48,683,684],{},[351,685],{"alt":18,"src":686},"\u002Fimgs\u002Fblogs\u002F679c72208198ca36a352f228_AD_4nXeeZuM8T-xBlD4Vf3j67K618n08qh8wIDLLtiLJG0ssA1Wj1V26u7wIDTX9sqLrtw8mB2c299dwzarGen62CG0Vh7nWstn5qbPGFcBaKJYEepTsLr5fHWv1U8uqbg8Y0UOK6fJ7.png",[48,688,689],{},[351,690],{"alt":18,"src":691},"\u002Fimgs\u002Fblogs\u002F679c625978031f40229de484_AD_4nXdLkLLJ30KKr-_A_rN1j8akVwBYacAWIPzWHoOReJF421890kfByZoQQxkLczihVSmiw5Q9J51-V9I2SEKITbwsYnANDDTlAVL5nQ_jfaHNTe9VEWhSoa7DZooCnilDYL6l6msmJg.png",[48,693,694],{},"The detailed infrastructure cost calculations for each data streaming engine are listed below:",[32,696,698],{"id":697},"streamnative-ursa","StreamNative - Ursa",[339,700,701,704,707,710],{},[342,702,703],{},"Server EC2 costs: 9 * $1.536\u002Fhr = $14",[342,705,706],{},"Client EC2 costs: 9 * $1.536\u002Fhr =$14",[342,708,709],{},"S3 write requests costs: 1350 r\u002Fs * $0.005\u002F1000r * 3600s = $24",[342,711,712],{},"S3 read requests costs: 1350 r\u002Fs * $0.0004\u002F1000r * 3600s = $2",[32,714,716],{"id":715},"aws-msk","AWS MSK",[339,718,719,722,725],{},[342,720,721],{},"Server EC2 costs: 15 * $3.264\u002Fhr = $49",[342,723,724],{},"Client side EC2 costs: 9 * $1.536\u002Fhr =$14",[342,726,727],{},"Interzone traffic - producer to broker: 5GB\u002Fs * ⅔ * $0.02\u002FG(in+out) * 3600 = $240",[32,729,731],{"id":730},"redpanda","RedPanda",[339,733,734,736,738,741,744],{},[342,735,703],{},[342,737,706],{},[342,739,740],{},"Interzone traffic - producer to broker: 5GB\u002Fs * ⅔ * $0.02\u002FGB(in+out) * 3600 = $240",[342,742,743],{},"Interzone traffic - replication: 10GB\u002Fs * $0.02\u002FGB(in+out) * 3600 = $720",[342,745,746],{},"Interzone traffic - broker to consumer: $0 (fetch from local zone)",[48,748,749,750,755],{},"Please note that we were unable to test ",[55,751,754],{"href":752,"rel":753},"https:\u002F\u002Fwww.redpanda.com\u002Fblog\u002Fcloud-topics-streaming-data-object-storage",[264],"Redpanda with Cloud Topics",", as it remains an announced but unreleased feature and is not yet available for evaluation. Based on the limited information available, while Cloud Topics may help optimize inter-zone data replication costs, producers still need to traverse inter-availability zones to connect to the topic partition owners and incur inter-zone traffic costs of up to $240 per hour.",[339,757,758,764],{},[342,759,760,763],{},[55,761,647],{"href":645,"rel":762},[264]," (when implemented) will help mitigate producer-to-broker inter-zone traffic, but it is not yet available. And it only works with the default partitioner (no record partition or key is specified).",[342,765,766],{},"Redpanda’s leader pinning helps only when all producers for the pinned topic are confined to a single AZ. In multi-AZ environments (like our benchmark), inter-zone producer traffic remains unavoidable.",[48,768,769],{},"Additionally, Redpanda’s Cloud Topics architecture is not documented publicly. Their blog mentions \"leader placement rules to optimize produce latency and ingress cost,\" but it is unclear whether this represents a shift away from a leader-based architecture or if it uses techniques similar to Ursa’s zone-aware approach.",[48,771,772],{},"We may revisit this comparison as more details become available.",[40,774,776],{"id":775},"comparing-total-cost-of-ownership","Comparing Total Cost of Ownership",[48,778,779],{},"As highlighted earlier, with a BYOC Ursa setup, you can achieve 5 GB\u002Fs throughput at just 5% of the infrastructure cost of a traditional leader-based data streaming engine, such as Kafka or RedPanda, while managing the infrastructure yourself. This significant cost reduction is enabled by Ursa’s leaderless architecture and lakehouse-native storage design, which eliminate overhead costs such as inter-zone traffic and leader-based data replication. By leveraging a lakehouse-native, leaderless architecture, Ursa reduces resource requirements, enabling you to handle high data throughput efficiently and at a fraction of the cost of RedPanda.",[48,781,782],{},"Now, let’s examine the total cost comparison, evaluating Ursa alongside other vendors, including those that have adopted a leaderless architecture (e.g., Confluent WarpStream). This comparison is based on a 5GB\u002Fs workload with a 7-day retention period, factoring in both storage cost and vendor costs Here are the key findings:",[339,784,785,788,791],{},[342,786,787],{},"Ursa ($164,353\u002Fmonth) is: 50% cheaper than Confluent WarpStream ($337,068\u002Fmonth)",[342,789,790],{},"85% cheaper than AWS MSK ($1,115,251\u002Fmonth)",[342,792,793],{},"86% cheaper than Redpanda ($1,202,853\u002Fmonth)",[48,795,796],{},"In addition to Ursa’s architectural advantages—eliminating most inter-AZ traffic and leveraging lakehouse storage for cost-effective data retention—it also adopts a more fair and cost-efficient pricing model: Elastic Throughput-based pricing. This approach aligns costs with actual usage, avoiding unnecessary overhead.",[48,798,799],{},"Unlike WarpStream, which charges for both storage and throughput, Ursa ensures that customers only pay for the throughput they actively use. Ursa’s pricing is based on compressed data sent by clients, meaning the more data compressed on the client side, the lower the cost. In contrast, WarpStream prices are based on uncompressed data, unfairly inflating expenses and failing to incentivize customers to optimize their client applications.",[48,801,802],{},"This distinction is crucial, as compressed data reduces both storage and network costs, making Ursa’s pricing model not only more cost-effective but also more transparent and predictable.",[48,804,805],{},[351,806],{"alt":18,"src":807},"\u002Fimgs\u002Fblogs\u002F679c602d194800c9206d9d58_AD_4nXcFlf755xgyz7htxhMhBV5fGrsxy642mQNodt61DTok_z1dwkw5A6lkO5hatXVneCaB0anbZPAyvLI3MlIMuQEYLEACHHvQMOr5UfaB37dfzkdqewDEvcT-20VGd_zzvJsuA00zGA.png",[48,809,810],{},[351,811],{"alt":18,"src":812},"\u002Fimgs\u002Fblogs\u002F679c62594e9c2e629fae73aa_AD_4nXeU6cOgItnjLsEZCOf13TEvMY_SHWWIxYP2OYUj-B1GUPyWO78OG08K_v03hwYSVcg06f9dqDiGmdwy76vynjmiDGL5bluZ5_XF4nSU_r59oOZdfViXndXt6s11vVOY7qwfZN8v.png",[32,814,816],{"id":815},"cost-breakdown","Cost Breakdown",[818,819,820],"h4",{"id":697},"StreamNative – Ursa",[339,822,823,826,829,832,835],{},[342,824,825],{},"EC2 (Server): 9 × $1.536\u002Fhr × 24 hr × 30 days = $9,953.28",[342,827,828],{},"S3 Write Requests: 1,350 r\u002Fs × $0.005\u002F1,000 r × 3,600 s × 24 hr × 30 days = $17,496",[342,830,831],{},"S3 Read Requests: 1,350 r\u002Fs × $0.0004\u002F1,000 r × 3,600 s × 24 hr × 30 days = $1,400",[342,833,834],{},"S3 Storage Costs: 5 GB\u002Fs × $0.021\u002FGB × 3,600 s × 24 hr × 7 days = $63,504",[342,836,837],{},"Vendor Cost: 200 ETU × $0.50\u002Fhr × 24 hr × 30 days = $72,000",[818,839,841],{"id":840},"warpstream","WarpStream",[339,843,844,847],{},[342,845,846],{},"Based on WarpStream’s pricing calculator (as of January 29, 2025), we assume a 4:1 client data compression ratio, meaning 20 GB\u002Fs of uncompressed data translates to 5 GB\u002Fs of compressed data.",[342,848,849,850,855],{},"It's important to note that WarpStream’s pricing structure has fluctuated frequently throughout January. We observed the cost reported by their calculator changing from $409,644 per month to $337,068 per month. This variability has been previously highlighted in the blog post “",[55,851,854],{"href":852,"rel":853},"https:\u002F\u002Fbigdata.2minutestreaming.com\u002Fp\u002Fthe-brutal-truth-about-apache-kafka-cost-calculators",[264],"The Brutal Truth About Kafka Cost Calculators","”. To ensure transparency, we have documented the pricing as of January 29, 2025.",[48,857,858],{},[351,859],{"alt":18,"src":860},"\u002Fimgs\u002Fblogs\u002F679c602e42713e0028e9af5e_AD_4nXcu5_VWTLu9jRYs6zX1MBAOtLQEo5gyfNSWPcbpnQHXTa8qNCFAXezRR2E8daygzYTTwd4dhJjaLaLM8C6y_3OGbu2NS7pdvEv3a8-ptNKOg7AeKnYqPQCAYvQ5EuxzuI3JYIvY.png",[818,862,864],{"id":863},"msk","MSK",[339,866,867,870,873],{},[342,868,869],{},"EC2 (Server): 15 * $3.264\u002Fhr × 24 hr × 30 days = $35,251",[342,871,872],{},"Interzone Traffic (Client-Server): 5 GB\u002Fs × ⅔ × $0.02\u002FGB (in+out) × 3,600 s × 24 hr × 30 days = $172,800",[342,874,875],{},"Storage: 5 GB\u002Fs × $0.1\u002FGB-month × 3,600 s × 24 hr × 7 days * 3 replicas = $907,200",[818,877,731],{"id":878},"redpanda-1",[339,880,881,884,886,889,892],{},[342,882,883],{},"EC2 (Server): 9 × $1.536\u002Fhr × 24 hr × 30 days = $9953",[342,885,872],{},[342,887,888],{},"Interzone Traffic (Replication): 5 GB\u002Fs × 2 × $0.02\u002FGB (in+out) × 3,600 s × 24 hr × 30 days = $518,400",[342,890,891],{},"Storage: 5 GB\u002Fs × $0.045\u002FGB-month(st1) × 3,600 s × 24 hr × 7 days * 3 replicas = $408,240",[342,893,894],{},"Vendor Cost: $93,333 per month (based on limited information. See additional notes below).",[818,896,898],{"id":897},"additional-notes","Additional Notes",[339,900,901],{},[342,902,903,904,909],{},"Redpanda does not publicly disclose its BYOC pricing, making it difficult to accurately assess its total costs. We refer to information from the whitepaper “",[55,905,908],{"href":906,"rel":907},"https:\u002F\u002Fwww.redpanda.com\u002Fresources\u002Fredpanda-vs-confluent-performance-tco-benchmark-report#form",[264],"Redpanda vs. Confluent: A Performance and TCO Benchmark Report by McKnight Consulting Group.","” for estimation purposes. Based on the Tier-8 pricing model in the whitepaper,  the estimated cost to support a 5GB\u002Fs workload would be $1.12 million per year ($93,333 per month). However, since this calculation is based on an estimation, we will revisit and refine the cost assessment once Redpanda publishes its BYOC pricing.",[48,911,912],{},[351,913],{"alt":18,"src":914},"\u002Fimgs\u002Fblogs\u002F679c602dc8a9859eed89a0ef_AD_4nXdbcO8vsNNPy4GtkNLlmNKf22fjxRvzLzH7CtOna1L08sTbvnZx3HhufeFqc1w4K2gEF7lxO2IR5supotxebAiGnA07Qa8Yr3Rd1pVK2LYKK4WurlJGwgdwwucZIFoF-N_2oBjY.png",[48,916,917],{},[351,918],{"alt":18,"src":919},"\u002Fimgs\u002Fblogs\u002F679c602d6bc1c2287e012540_AD_4nXfcHZnLfjbjIr3ZAgoQXT9dwP3aQCOQPmGZZJUtpNZSwE6qY6M3yehIaBxCwxEIeu5PVdUPY0zhyjnow26YfgjdYgSG4GnV9ibxu0YWTIpwng6z_F6FUGJMpERMKtpsFESzXSN_Sw.png",[339,921,922,925],{},[342,923,924],{},"When estimating the storage costs for Kafka and Redpanda, we assume the use of HDD storage at $0.045\u002FGB, based on the premise that both systems can fully utilize disk bandwidth without incurring the higher costs associated with GP2 or GP3 volumes. However, in practice, many users opt for GP2 or GP3, significantly increasing the total storage cost for Kafka and Redpanda.",[342,926,927],{},"Unlike disk-based solutions, S3 storage does not require capacity preallocation—Ursa only incurs costs for the actual data stored. This contrasts with Kafka and Redpanda, where preallocating storage can drive up expenses. As a result, the real-world storage costs for Kafka and Redpanda are often 50% higher than the estimates above.",[40,929,931],{"id":930},"conclusion","Conclusion",[48,933,934],{},"Ursa represents a transformative shift in streaming data infrastructure, offering cost efficiency, scalability, and flexibility without compromising durability or reliability. By leveraging a leaderless architecture and eliminating inter-zone data replication, Ursa reduces total cost of ownership by over 90% compared to traditional leader-based streaming engines like Kafka and Redpanda. Its direct integration with cloud storage and scalable metadata & index management via Oxia ensure high availability and simplified infrastructure management.",[32,936,938],{"id":937},"balancing-latency-and-cost","Balancing Latency and Cost",[48,940,941,945],{},[55,942,944],{"href":943},"\u002Fblog\u002Fcap-theorem-for-data-streaming","Ursa trades off slightly higher latency for ultra low cost",", making it an ideal choice for the majority of streaming workloads, especially those that prioritize throughput and cost savings over ultra-low latency. Meanwhile, StreamNative’s BookKeeper-based engine remains the preferred solution for real-time, latency-sensitive applications. By combining these two approaches, StreamNative empowers customers with the flexibility to choose the right engine for their specific needs—whether it's maximizing cost savings or achieving ultra low-latency real-time performance.",[32,947,949],{"id":948},"the-future-of-streaming-infrastructure","The Future of Streaming Infrastructure",[48,951,952],{},"In an era where data fuels AI, analytics, and real-time decision-making, managing infrastructure costs is critical to sustaining innovation. Ursa is not just a cost-cutting alternative—it is a forward-thinking, lakehouse-native platform that redefines how modern data streaming infrastructure should be built and operated.",[48,954,955,956,961],{},"Whether your priority is reducing costs, improving flexibility, or ingesting massive data into lakehouses, Ursa delivers a future-proof solution for the evolving demands of real-time data streaming. ",[55,957,960],{"href":958,"rel":959},"https:\u002F\u002Fconsole.streamnative.cloud\u002F",[264],"Get started"," with StreamNative Ursa today!",[963,964,966],"h1",{"id":965},"references","References",[48,968,969,972,973],{},[970,971,430],"span",{}," ",[55,974,975],{"href":975},"\u002Fblog\u002Fintroducing-oxia-scalable-metadata-and-coordination",[48,977,978,972,980],{},[970,979,379],{},[55,981,378],{"href":378},[48,983,984,972,987],{},[970,985,986],{},"StreamNative pricing",[55,988,989],{"href":989,"rel":990},"https:\u002F\u002Fdocs.streamnative.io\u002Fdocs\u002Fbilling-overview",[264],[48,992,993,972,996],{},[970,994,995],{},"WarpStream pricing",[55,997,998],{"href":998,"rel":999},"https:\u002F\u002Fwww.warpstream.com\u002Fpricing#pricingfaqs",[264],[48,1001,1002,972,1005],{},[970,1003,1004],{},"AWS S3 pricing",[55,1006,1007],{"href":1007,"rel":1008},"https:\u002F\u002Faws.amazon.com\u002Fs3\u002Fpricing\u002F",[264],[48,1010,1011,972,1014],{},[970,1012,1013],{},"AWS EBS pricing",[55,1015,1016],{"href":1016,"rel":1017},"https:\u002F\u002Faws.amazon.com\u002Febs\u002Fpricing\u002F",[264],[48,1019,1020,972,1023],{},[970,1021,1022],{},"AWS MSK pricing",[55,1024,1025],{"href":1025,"rel":1026},"https:\u002F\u002Faws.amazon.com\u002Fmsk\u002Fpricing\u002F",[264],[48,1028,1029,972,1032],{},[970,1030,1031],{},"The Brutal Truth about Kafka Cost Calculators",[55,1033,852],{"href":852,"rel":1034},[264],[48,1036,1037,972,1040],{},[970,1038,1039],{},"Redpanda vs. Confluent: A Performance and TCO Benchmark Report by McKnight Consulting Group",[55,1041,906],{"href":906,"rel":1042},[264],{"title":18,"searchDepth":19,"depth":19,"links":1044},[1045,1046,1047,1052,1056,1057,1066,1069],{"id":333,"depth":19,"text":334},{"id":372,"depth":19,"text":373},{"id":397,"depth":19,"text":398,"children":1048},[1049,1050,1051],{"id":409,"depth":279,"text":410},{"id":434,"depth":279,"text":435},{"id":455,"depth":279,"text":456},{"id":479,"depth":19,"text":480,"children":1053},[1054,1055],{"id":483,"depth":279,"text":484},{"id":498,"depth":279,"text":499},{"id":539,"depth":19,"text":540},{"id":551,"depth":19,"text":552,"children":1058},[1059,1060,1061,1062,1063,1064,1065],{"id":558,"depth":279,"text":559},{"id":604,"depth":279,"text":605},{"id":622,"depth":279,"text":623},{"id":669,"depth":279,"text":670},{"id":697,"depth":279,"text":698},{"id":715,"depth":279,"text":716},{"id":730,"depth":279,"text":731},{"id":775,"depth":19,"text":776,"children":1067},[1068],{"id":815,"depth":279,"text":816},{"id":930,"depth":19,"text":931,"children":1070},[1071,1072],{"id":937,"depth":279,"text":938},{"id":948,"depth":279,"text":949},"StreamNative Cloud","2025-01-31","Discover how Ursa achieves 5GB\u002Fs Kafka workloads at just 5% of the cost of traditional streaming engines like Redpanda and AWS MSK. See our benchmark results comparing infrastructure costs, total cost of ownership (TCO), and performance across leading Kafka vendors.","\u002Fimgs\u002Fblogs\u002F679c6593d25099b1cdcec4ca_image-31.png",{},"\u002Fblog\u002Fhow-we-run-a-5-gb-s-kafka-workload-for-just-50-per-hour","30 min",{"title":308,"description":1075},"blog\u002Fhow-we-run-a-5-gb-s-kafka-workload-for-just-50-per-hour",[1083,1084,303],"TCO","Apache Kafka","CDUawvFKTs_AD8usvmIcTleU3mbfA0QAoPZM6xfVuo8",{"id":1087,"title":1088,"authors":1089,"body":1090,"canonicalUrl":289,"category":1313,"createdAt":289,"date":1314,"description":1315,"extension":8,"featured":294,"image":1316,"isDraft":294,"link":289,"meta":1317,"navigation":7,"order":296,"path":1318,"readingTime":1319,"relatedResources":289,"seo":1320,"stem":1321,"tags":1322,"__hash__":1324},"blogs\u002Fblog\u002Fpulsar-newbie-guide-for-kafka-engineers-part-4-subscriptions-consumers.md","Pulsar Newbie Guide for Kafka Engineers (Part 4): Subscriptions & Consumers",[313,312,311],{"type":15,"value":1091,"toc":1302},[1092,1095,1098,1102,1111,1114,1128,1136,1139,1143,1146,1150,1153,1156,1160,1163,1166,1169,1172,1175,1183,1186,1189,1193,1196,1199,1202,1205,1209,1212,1215,1218,1221,1225,1228,1231,1242,1244,1247,1261,1264,1267,1270,1272,1274,1283,1286,1293,1300],[48,1093,1094],{},"‍TL;DR",[48,1096,1097],{},"This post dives into how Apache Pulsar handles subscriptions and consumers, which is Pulsar’s equivalent to Kafka’s consumer groups. Pulsar requires consumers to specify a subscription name, which acts like a consumer group ID in Kafka. You can have multiple subscriptions on the same topic (for multi-group fan-out) and each subscription can have one or more consumer instances attached. Pulsar offers four subscription types – Exclusive, Failover, Shared, and Key_Shared – that determine how messages are delivered to consumers. Exclusive (the default) and Failover ensure only one consumer (or one active consumer at a time) receives all messages (preserving order like Kafka) of one or multiple partitions. Shared and Key_Shared allow multiple consumers to split the messages of a partition: Shared distributes messages round-robin (like a queue, higher throughput but no global order guarantee), while Key_Shared also distributes messages but guarantees ordering per message key. Pulsar’s broker tracks a subscription cursor (like an offset) for each subscription to maintain where consumers left off, and unacknowledged messages form a backlog (analogous to Kafka’s consumer lag) that you can monitor. In short, Pulsar’s flexible subscription model lets you achieve Kafka-like streaming and RabbitMQ-like queuing patterns on the same platform.",[40,1099,1101],{"id":1100},"understanding-pulsar-subscriptions-vs-kafka-consumer-groups","Understanding Pulsar Subscriptions vs Kafka Consumer Groups",[48,1103,1104,1105,1110],{},"If you come from Kafka, you’re used to consumer groups: a named group of consumers where each Kafka partition is consumed by one member of the group. Pulsar approaches this concept with subscriptions. A subscription in Pulsar is essentially a named rule for consuming a topic – think of it as a durable consumer group on a single topic. Consumers subscribe by specifying a subscription name, and Pulsar will ensure messages are delivered according to the subscription’s type (more on types soon). Under the hood, when a subscription is created, ",[55,1106,1109],{"href":1107,"rel":1108},"https:\u002F\u002Fpulsar.apache.org\u002Fdocs\u002Fnext\u002Fconcepts-messaging\u002F#:~:text=,record%20the%20last%20consumed%20position",[264],"Pulsar sets up a cursor to track the subscription’s position in the topic",". This cursor is stored durably (in BookKeeper) so that if the consumer disconnects or the broker restarts, the subscription’s last read position is remembered. In Kafka, the consumer group’s offsets serve a similar purpose (often stored in an internal topic). In Pulsar, the broker itself manages the offsets (cursors), which simplifies offset management – you don’t need an external store or to manually commit offsets, it’s handled by the act of acknowledging messages.",[48,1112,1113],{},"Because Pulsar decouples the subscription from the physical consumer, you can have multiple subscriptions on one topic just by using different subscription names. Each subscription name represents an independent feed of the topic. For example, if you have two separate services that need the same data, you can have one Pulsar topic with two subscriptions (say “serviceA” and “serviceB”), and each subscription will get every message published – effectively duplicating the stream, like two separate Kafka consumer groups reading the same topic. Pulsar keeps track of a cursor for each subscription, and each subscription has its own backlog (messages published to the topic that have not yet been acknowledged on that subscription). This is powerful: it means Pulsar inherently supports fan-out (pub-sub) as well as work-queue sharing patterns on the same data. You could have one subscription where only one consumer reads all messages (stream processing), and another subscription on the same topic where a pool of consumers share the messages (distributed queue processing).",[48,1115,1116,1117,1121,1122,1127],{},"Let’s clarify some terminology: acknowledgment in Pulsar is the act of a consumer confirming it has processed a message. When a message is acknowledged, the subscription’s cursor moves forward, and the message is considered consumed for that subscription. Acking in Pulsar is analogous to committing an offset in Kafka, but it’s automatic when you use Pulsar’s APIs (or CLI) unless you disable auto-ack. Importantly, Pulsar supports two acknowledgment modes: individual acks (ack each message) and cumulative acks (acknowledge all messages up to a given position in one go). Cumulative acks are useful in Exclusive\u002FFailover subscriptions to advance the cursor in bulk, but they aren’t supported in Shared mode (since out-of-order consumption makes it tricky). In practice, individual ack is common and is what the Pulsar client does by default. Unacknowledged messages remain in the subscription backlog and will be redelivered later or to other consumers if possible. The backlog is essentially the number of messages pending acknowledgment – similar to the idea of “consumer lag” in Kafka (how many messages behind the tip of the log you are). You can view the backlog and other stats with pulsar-admin topics stats as we saw in ",[55,1118,1120],{"href":1119},"\u002Fblog\u002Fpulsar-newbie-guide-for-kafka-engineers-part-1-kafka---pulsar-cli-cheatsheet","the first blog",", which shows the subscription cursor position and backlog for each subscription. Pulsar will retain messages as long as there is at least one subscription that hasn’t acknowledged them (or until retention limits kick in). If a topic has no subscriptions or if all subscriptions have acknowledged a message, that message can be deleted (",[55,1123,1126],{"href":1124,"rel":1125},"https:\u002F\u002Fpulsar.apache.org\u002Fdocs\u002Fnext\u002Fconcepts-messaging\u002F#:~:text=By%20default%2C%20messages%20of%20a,see%20message%20retention%20and%20expiry",[264],"Pulsar doesn’t require a fixed retention period if messages are consumed, unless you’ve enabled time or size-based retention","). This is a key difference: Kafka stores messages for a time window regardless of consumption, whereas Pulsar by default can delete acknowledged data (making it more storage-efficient for queue use cases), while still allowing you to configure retention to keep data for replay if needed.",[48,1129,1130,1131,1135],{},"Another difference is how new consumers start consuming. In Kafka, when a new consumer group is created, it has a auto.offset.reset policy (earliest or latest). In Pulsar, when you subscribe to a topic with a new subscription name, you also choose where to start: by default it’s at the latest position (meaning you only get messages published from that point onwards), but you can specify -p Earliest (or subscriptionInitialPosition in the client API) to consume from the beginning of the topic’s backlog. We demonstrated this in our ",[55,1132,1134],{"href":1133},"\u002Fblog\u002Fpulsar-newbie-guide-for-kafka-engineers-part-1-kafka---pulsar-cli-cheatsheet#:~:text=consume%20messages%3A","first blog post"," by using -p Earliest for the console consumer to read existing messages. So remember, if you create a new subscription and want to replay from the start, specify the initial position accordingly; otherwise you might think “nothing is coming through” simply because by default it’s tailing new messages only.",[48,1137,1138],{},"In summary, think of a Pulsar subscription as the combination of Kafka’s consumer group concept and offset tracking mechanism, managed for you by Pulsar’s broker. Next, let’s explore the different subscription types that Pulsar offers – this is where Pulsar really shines in flexibility compared to Kafka.",[40,1140,1142],{"id":1141},"subscription-types-in-pulsar","Subscription Types in Pulsar",[48,1144,1145],{},"Pulsar has four subscription types: Exclusive, Failover, Shared, and Key_Shared. These define how messages are delivered when multiple consumers attach to the same subscription name on a topic. (If consumers use different subscription names, they’re completely isolated – each subscription sees all messages independently, as discussed.) By choosing a subscription type, you control whether a subscription behaves more like a traditional stream (one consumer getting all messages in order) or a queue (multiple consumers dividing up messages) or a blend of both. Let’s break down each type:",[32,1147,1149],{"id":1148},"exclusive","Exclusive",[48,1151,1152],{},"Exclusive is the default subscription type in Pulsar. As the name suggests, an Exclusive subscription only allows one consumer at a time to attach to all the partitions of a topic. If a second consumer tries to subscribe with the same subscription name while an active consumer is already attached, the broker will refuse the second consumer (the consumer will get an error indicating the subscription is already taken). This is akin to a Kafka consumer group with a single member (and Kafka would similarly not use a second member if there’s only one partition – it would just sit idle). Exclusive subscriptions guarantee that the entire topic’s messages go to one consumer, preserving the message order end-to-end, since no other consumer is concurrently receiving messages.",[48,1154,1155],{},"Because only one consumer can consume, Exclusive subscriptions aren’t about scaling out consumption; they are useful for strict ordering or when you truly only want one consumer processing a given stream of data. One common pattern is to use multiple exclusive subscriptions on the same topic to implement a pub-sub fan-out: for example, you might have two different services that need the data from topic X. You can have Service A use subscription “subA” (exclusive) and Service B use subscription “subB” (exclusive). Both services will get all messages from topic X independently (since they are on different subscriptions), each in order for themselves. This is exactly how Pulsar enables pub-sub – multiple exclusive subscriptions on the same topic – analogous to having multiple Kafka consumer groups reading the same topic. The difference is that in Pulsar, the broker tracks the cursor for each subscription and retains messages until each subscription acknowledges them, so you get durable pub-sub with one topic rather than duplicating data. In short, Exclusive = one consumer at a time, simplest model. It’s also the fallback: if you don’t specify a subscription type, you’ll get Exclusive by default.",[32,1157,1159],{"id":1158},"failover","Failover",[48,1161,1162],{},"Failover subscriptions allow multiple consumers to attach to the same subscription, but still only one consumer actively receives messages from a partition at any given time. The idea is to have a primary consumer and one or more backup consumers. If the primary (master) consumer disconnects or becomes unreachable, one of the backups is promoted to be the new primary and continues consuming from where the previous one left off. This provides high availability for consumption: if you have a critical processing pipeline, you can run a standby consumer that will take over automatically if the main consumer fails, minimizing downtime or data buildup.",[48,1164,1165],{},"How does Pulsar choose the primary and the failover order? By default, it’s based on the order in which consumers subscribe (or you can assign each consumer a priority level). The first consumer to attach becomes the master for the topic (or for each partition of the topic, if partitioned – more on that in a second). Second becomes the next in line, and so on. All consumers beyond the first are in a standby mode – they are connected and ready, but they do not receive messages while the master is active. They typically sit idle (Pulsar might send occasional heartbeats to them to know they’re alive, but no actual message traffic).",[48,1167,1168],{},"When the master consumer disconnects (or you deliberately close it, or it crashes), Pulsar will automatically start delivering messages to the next consumer in line. Any messages that were sent to the original consumer but not acknowledged will be redelivered to the new consumer as well, so no messages are lost. The newly promoted consumer continues from the last acknowledged position of the previous one, maintaining continuity. Message order is preserved under Failover because at any given time, each partition’s messages are processed by a single consumer. It’s similar to Exclusive in that sense (one-at-a-time consumption), except it permits a standby to take over instantly on failure. In fact, from an ordering standpoint, Exclusive and Failover are the same (strict ordering); the difference is Failover gives you redundancy.",[48,1170,1171],{},"A key detail for those coming from Kafka: with partitioned topics, Failover will assign the master role per partition. This means if you have a topic with 10 partitions and two consumers in a failover subscription, Pulsar will try to balance such that each consumer is master for some of the partitions. For example, consumer A might be primary for partitions 0-4 and consumer B for partitions 5-9 (the assignment is done by the broker, trying to even it out). In that case, both consumers are actually active simultaneously, but on different partitions. If one consumer dies, the other will take over all partitions. This behavior is analogous to Kafka’s consumer group rebalancing (each consumer gets some partitions). However, if the topic is non-partitioned (a single partition essentially), then only one consumer (the first or highest priority) gets all the messages, and the others truly get nothing until failover occurs. So, Failover mode can act both as a pure hot-standby (in single-partition topics) or as a load-sharing mechanism across partitions (in multi-partition topics). The main point remains: for each partition, one consumer is doing the work at a time. This guarantees order per partition and no duplicate processing. (If a new consumer with higher priority joins, it can even preempt and become the master for partitions, but that’s an edge scenario.)",[48,1173,1174],{},"In practice, you’d use Failover when you need reliability – e.g., you have a critical consumer and you want a backup to seamlessly continue if the primary fails. It’s common in scenarios where processing order matters but you also want quick failover for HA. If you tested this with the Pulsar CLI, you could do something like:",[339,1176,1177,1180],{},[342,1178,1179],{},"Terminal 1: pulsar-client consume -s mySub -t Failover -p Earliest -n 0 persistent:\u002F\u002Fpublic\u002Fdefault\u002Fmy-topic",[342,1181,1182],{},"Terminal 2: pulsar-client consume -s mySub -t Failover -p Earliest -n 0 persistent:\u002F\u002Fpublic\u002Fdefault\u002Fmy-topic",[48,1184,1185],{},"Both will connect. You then publish some messages (using pulsar-client produce). You’ll notice only one of the two terminals is printing the messages – that’s the master. If you stop Terminal 1 (the master), Terminal 2 will immediately start receiving any new messages. Any messages that Terminal 1 did not ack before it went down will be redelivered to Terminal 2 as well. This behavior confirms the failover: one active consumer at a time, automatic hand-off on failure. This is different from Kafka where if a consumer in a group dies, there is a rebalance delay and then other consumers resume partitions; Pulsar’s failover is near-instant for new messages because the standby is already connected and ready.",[48,1187,1188],{},"One caveat: if a failover happens at an awkward time, there is a possibility of a couple of messages being processed out of order or twice (for example, the old consumer got a batch of messages but crashed before acking some, and the new consumer might receive some of those messages again while the old one may have actually processed some before crashing). Pulsar’s documentation notes that in some cases you may see a duplicate or an out-of-order message around the switchover. But in general, failover mode is designed to hand off smoothly with minimal duplication.",[32,1190,1192],{"id":1191},"shared-round-robin","Shared (Round-Robin)",[48,1194,1195],{},"With a Shared subscription, multiple consumers can connect to the same subscription on a topic partition and receive messages concurrently. Unlike Exclusive\u002FFailover, where only one consumer gets all messages, in Shared mode the broker will round-robin dispatch messages to consumers. Each message from the topic goes to one of the consumers in the group (never to more than one), distributing the load. Effectively, this turns your topic + subscription into a work queue – multiple consumers are pulling from the same queue of messages, each handling different messages in parallel. This is great for scaling out processing: if one consumer instance isn’t fast enough to keep up with the topic’s throughput, you can add a second, third, etc., on the same subscription and Pulsar will spread the messages among them.",[48,1197,1198],{},"Because messages are distributed, ordering is not guaranteed across the subscription as a whole. If message A and then B are published, it’s possible A goes to Consumer 1 and B goes to Consumer 2, and Consumer 2 might process B before Consumer 1 processes A. There’s no coordination to preserve publish order in a Shared subscription – the goal is throughput and load balancing. If ordering is important, Shared might not be the right choice (or you’d need to ensure all related messages go to the same consumer, which is what Key_Shared is for). Shared subs also do not support cumulative ack (since each consumer may be at a different position in the stream, there’s no single “up to X” point that makes sense to ack collectively) – consumers should ack messages individually.",[48,1200,1201],{},"One of the big advantages of Shared mode is how it handles slow or stuck consumers. Since each message is delivered to one consumer at a time, if that consumer fails to acknowledge (maybe it died or is hanging), Pulsar can detect that (via ack timeouts or the TCP connection closing) and will redeliver those unacked messages to another consumer in the group. For example, if Consumer A received Message 5 but never acked it (maybe Consumer A crashed), after a timeout, Pulsar will requeue Message 5 and send it to Consumer B (assuming Consumer B is healthy). This ensures that a bad consumer doesn’t black-hole messages – the work will be picked up by someone else. Meanwhile, other messages that were sent to other consumers can continue being processed; one slow consumer doesn’t block the others. This contrasts with Kafka’s model where if a consumer in a group slows down on a partition, that partition’s consumption lags behind (since Kafka won’t hand those messages to a different consumer unless the first consumer is considered dead and a rebalance happens). Pulsar’s Shared mode provides a more dynamic load balancing: each message is assigned to a consumer, and if that consumer can’t handle it, it can be reassigned. This is why Pulsar can achieve true queue semantics on a stream. It’s very much like how a RabbitMQ work queue would behave – many consumers pulling tasks off a queue, each ACKing tasks as done, and the system requeuing unacked tasks if a worker goes away.",[48,1203,1204],{},"In terms of usage, Shared subscriptions are ideal when you have independent messages and you want to maximize parallel processing. If ordering doesn’t matter (or you only care about per-message handling, not sequence), use Shared to scale out. For example, imagine a thumbnail generation service where each message is “generate a thumbnail for image X”. The order doesn’t matter at all – you just want to process as many as possible in parallel. A Pulsar topic with a Shared subscription and many consumers allows you to spin up N workers and they’ll automatically load balance the tasks. Each consumer will acknowledge as it finishes a message; the subscription’s cursor advances per message as a result of those acks (the cursor essentially will mark the message as consumed when acked, but since messages might be out-of-order, the cursor might have holes – which is fine, those holes are the backlog of unacked messages). The Pulsar admin stats will show how many messages are in backlog (i.e., not yet acked). In a healthy steady state, backlog stays near zero as consumers keep up; if consumers fall behind, backlog grows (like a queue depth). You can always add more consumers to that subscription to catch up if needed – Pulsar will incorporate them and start sharing messages with the new consumers immediately.",[32,1206,1208],{"id":1207},"key_shared","Key_Shared",[48,1210,1211],{},"Key_Shared is the newest addition (relative to others) to Pulsar’s subscription types. It’s like an enhanced version of Shared that strikes a balance between ordering and parallelism. In a Key_Shared subscription, multiple consumers can attach and all will receive messages, but messages that share the same key will always go to the same consumer. In other words, Pulsar will hash or map message keys to specific consumers, and ensure that the order of messages for each key is preserved on that consumer. If that consumer disconnects, the messages for that key will be routed to another consumer, but always in a way that maintains the ordering from the last acked message onwards (Pulsar will not suddenly deliver older unacked messages of that key to a new consumer out of order).",[48,1213,1214],{},"This mode is extremely useful when your messages have some natural key (like user ID, or order ID, or device ID) and you want to ensure all messages for that entity are processed in order, but you don’t care about ordering across different entities. Kafka achieves something similar by requiring you to put all messages for an entity on the same partition – which then ties parallelism to partition count. Pulsar’s Key_Shared does this dynamically with consumers: you could have a single topic (single partition if you want) and still scale out consumption by key. The broker handles the assignment of keys to consumers. In fact, if you add more consumers, Pulsar can redistribute the hash ranges of keys among them automatically. If a consumer leaves, its key range is taken over by others. This all happens behind the scenes, giving you the effect of partitioning without manual partition management.",[48,1216,1217],{},"From the application perspective, Key_Shared means: “I have multiple consumers, but I want to ensure no two consumers ever process the same key’s messages concurrently or out of order.” It provides ordering per key and load-balancing across keys. A classic use case might be an event stream where events are tagged with a customer ID and you want per-customer ordering (maybe to avoid race conditions updating a customer’s state), but you also want to process different customers in parallel. With Kafka, you’d need as many partitions as you have parallelism (and all messages for a customer must go to the same partition). With Pulsar Key_Shared, you can spin up multiple consumers for a topic and Pulsar will ensure messages with the same key always go to the same consumer. For example, imagine tracking user activity where each message has a user_id as the key: With 10 consumers: All events for key:\"user_789\" will consistently go to the same consumer (let's say Consumer #3). Other users like key:\"user_456\" and key:\"user_123\" will each be consistently routed to their own assigned consumers. When you scale to 20 consumers: key:\"user_789\" might get reassigned to Consumer #7, but all their events will still go to just that one consumer. This gives you parallel processing across different users while maintaining strict ordering per individual user. The key-to-consumer assignment is handled automatically by Pulsar's hash-based distribution. This is handled by one of the available key distribution strategies (like auto-split ranges or consistent hashing), but you usually don’t need to worry about the exact algorithm as a user – just know that it balances keys.",[48,1219,1220],{},"In summary, Key_Shared = like Shared (multiple consumers in parallel), but with ordering guaranteed on a per-key basis. It’s the best of both worlds for many scenarios, giving you scaling with correctness. Key_Shared is often recommended when your use case can leverage message keys to delineate order boundaries – for instance, any stateful processing per entity should use Key_Shared if you want to scale out that processing. If ordering doesn’t matter at all, plain Shared is fine; if global order matters, you’d stick to Exclusive\u002FFailover. Key_Shared fills the gap of “order matters per entity, but not globally.”",[32,1222,1224],{"id":1223},"putting-it-all-together","Putting it All Together",[48,1226,1227],{},"The beauty of Pulsar is that you can mix and match these subscription types to fit your needs, even on the same topic. For example, you could have one subscription on a topic using Key_Shared with 5 consumers processing events in parallel, and another subscription on the same topic using Exclusive to feed a separate system that needs the full ordered stream. The publisher only writes the message once, but Pulsar can deliver it in multiple ways to different subscribers. This is something not easily done in Kafka without duplicating data or using external systems – Pulsar’s design cleanly separates the publish side from the subscribe side through these named subscriptions.",[48,1229,1230],{},"To reinforce these concepts, it’s worth comparing with Kafka’s approach:",[339,1232,1233,1236,1239],{},[342,1234,1235],{},"In Kafka, if you want to do pub-sub (fan-out) you typically create multiple consumer groups. In Pulsar, you create multiple subscriptions (which is effectively the same idea). Each subscription has its own cursor and backlog.",[342,1237,1238],{},"In Kafka, if you want a work-queue pattern, you’d create a consumer group with multiple consumers. Kafka will then assign partitions to consumers (can’t have more consumers than partitions effectively) and you get parallelism at the partition level, but strict ordering within each partition. If one message in a partition is slow or causes an error, it blocks everything behind it in that partition until it’s handled or skipped. In Pulsar, for work-queue, you use a Shared subscription on a topic (which could even be a single partition topic). You get parallelism per message, not just per partition, and a slow message doesn’t block others – it can be retried elsewhere while other messages still flow to other consumers. This is a major difference in the consumption model and is one of Pulsar’s key advantages for certain workloads.",[342,1240,1241],{},"Key_Shared doesn’t really have a direct equivalent in Kafka. Kafka would require you partition by key to get key-ordering, but that then ties you to a static number of partitions and possibly uneven key distribution. Pulsar’s Key_Shared is more flexible and dynamic in that regard (you can increase consumers on the fly and it will redistribute keys, whereas Kafka partition count is fixed once topic is created, unless you manually add partitions which is a heavyweight operation and can cause ordering issues of its own for existing keys).",[40,1243,931],{"id":930},[48,1245,1246],{},"In this part, we corrected and clarified Pulsar’s subscription and consumer mechanics. We learned that Pulsar’s subscription name is analogous to Kafka’s consumer group – it’s how Pulsar tracks a consumer or group of consumers reading a topic. Pulsar’s broker maintains a subscription cursor for each subscription to know which messages have been acknowledged (processed), ensuring durability and allowing consumers to pick up where they left off after disconnects. We also reviewed the four subscription types in Pulsar and how they map to messaging patterns:",[339,1248,1249,1252,1255,1258],{},[342,1250,1251],{},"Exclusive: Single-consumer, like Kafka only allows one consumer (ensures total order).",[342,1253,1254],{},"Failover: Single active consumer with standby failovers, same as Kafka (ensures order, with quick failover on consumer loss).",[342,1256,1257],{},"Shared: Multiple consumers, competing for messages in a queue-like fashion (higher throughput via parallelism, no overall order guarantee, built-in replay of unacked messages).",[342,1259,1260],{},"Key_Shared: Multiple consumers with ordering per key (best for parallel processing when per-key order matters, effectively combining ordering and load balancing).",[48,1262,1263],{},"Pulsar gives you the freedom to use the right tool for the job – or even use both at the same time on the same data. You can have stream processing and queue processing co-exist on one topic through different subscriptions. This flexibility is one of the reasons Kafka engineers find Pulsar intriguing: it’s like having Kafka and RabbitMQ in one system. By leveraging subscription types, you can implement complex messaging workflows without deploying multiple platforms.",[48,1265,1266],{},"Now that you understand subscriptions and consumers in Pulsar, you’re well-equipped to build systems that take advantage of Pulsar’s dual nature of streaming and queuing. In the next part of the Pulsar Newbie Guide, we’ll continue our journey (stay tuned!). Meanwhile, feel free to experiment with subscription settings in a test environment to solidify your understanding – Pulsar’s CLI and admin tools make it easy to observe how messages flow under each mode. Happy Pulsar-ing!",[48,1268,1269],{},"‍",[208,1271],{},[48,1273,1269],{},[48,1275,1276,1277,1282],{},"Want to go deeper into real-time data and streaming architectures? Join us at the ",[55,1278,1281],{"href":1279,"rel":1280},"https:\u002F\u002Fdatastreaming-summit.org\u002Fevent\u002Fdata-streaming-sf-2025",[264],"Data Streaming Summit San Francisco 2025"," on September 29–30 at the Grand Hyatt at SFO.",[48,1284,1285],{},"30+ sessions | 4 tracks | Real-world insights from OpenAI, Netflix, LinkedIn, Paypal, Uber, AWS, Google, Motorq, Databricks, Ververica, Confluent & more!",[48,1287,1288],{},[55,1289,1292],{"href":1290,"rel":1291},"https:\u002F\u002Fdatastreaming-summit.org\u002Fevent\u002Fdata-streaming-sf-2025\u002Fschedule",[264],"[Explore the Full Agenda]",[48,1294,1295],{},[55,1296,1299],{"href":1297,"rel":1298},"https:\u002F\u002Fwww.eventbrite.com\u002Fe\u002Fdata-streaming-summit-san-francisco-2025-tickets-1432401484399?aff=oddtdtcreator",[264],"[Register Now]",[48,1301,1269],{},{"title":18,"searchDepth":19,"depth":19,"links":1303},[1304,1305,1312],{"id":1100,"depth":19,"text":1101},{"id":1141,"depth":19,"text":1142,"children":1306},[1307,1308,1309,1310,1311],{"id":1148,"depth":279,"text":1149},{"id":1158,"depth":279,"text":1159},{"id":1191,"depth":279,"text":1192},{"id":1207,"depth":279,"text":1208},{"id":1223,"depth":279,"text":1224},{"id":930,"depth":19,"text":931},"Apache Pulsar","2025-08-29","Part 4 of the Pulsar Newbie Guide for Kafka Engineers explores how Apache Pulsar handles subscriptions and consumers—its equivalent to Kafka consumer groups. Learn how Pulsar’s four subscription types (Exclusive, Failover, Shared, and Key_Shared) enable both streaming and queuing patterns, offering greater flexibility, scalability, and ordering guarantees than Kafka alone.","\u002Fimgs\u002Fblogs\u002F68b1b4c6fd73cfc228f21a9a_04.-Subscriptions-&-Consumers-1.png",{},"\u002Fblog\u002Fpulsar-newbie-guide-for-kafka-engineers-part-4-subscriptions-consumers","8 min read",{"title":1088,"description":1315},"blog\u002Fpulsar-newbie-guide-for-kafka-engineers-part-4-subscriptions-consumers",[1313,1323,1084],"Intro","gyuiHbKz9kgPqmIzwjAvChx-sryMOrT4eCdmvX4WigQ",[1326,1342,1356],{"id":1327,"title":313,"bioSummary":1328,"email":289,"extension":8,"image":1329,"linkedinUrl":1330,"meta":1331,"position":1338,"stem":1339,"twitterUrl":1340,"__hash__":1341},"authors\u002Fauthors\u002Fpenghui-li.md","Penghui Li is passionate about helping organizations to architect and implement messaging services. Prior to StreamNative, Penghui was a Software Engineer at Zhaopin.com, where he was the leading Pulsar advocate and helped the company adopt and implement the technology. He is an Apache Pulsar Committer and PMC member.","\u002Fimgs\u002Fauthors\u002Fpenghui-li.webp","https:\u002F\u002Fwww.linkedin.com\u002Fin\u002Fpenghui-li-244173184\u002F",{"body":1332},{"type":15,"value":1333,"toc":1336},[1334],[48,1335,1328],{},{"title":18,"searchDepth":19,"depth":19,"links":1337},[],"Director of Streaming, StreamNative & Apache Pulsar PMC Member","authors\u002Fpenghui-li","https:\u002F\u002Ftwitter.com\u002Flipenghui6","WDjET7GfxqVQJ8mTEMaRhgpxRdDy18qZkgQDJlwjvbI",{"id":1343,"title":312,"bioSummary":1344,"email":289,"extension":8,"image":1345,"linkedinUrl":289,"meta":1346,"position":1353,"stem":1354,"twitterUrl":289,"__hash__":1355},"authors\u002Fauthors\u002Fhang.md","Hang Chen, an Apache Pulsar and BookKeeper PMC member, is Director of Storage at StreamNative, where he leads the design of next-generation storage architectures and Lakehouse integrations. His work delivers scalable, high-performance infrastructure powering modern cloud-native event streaming platforms.","\u002Fimgs\u002Fauthors\u002Fhang.webp",{"body":1347},{"type":15,"value":1348,"toc":1351},[1349],[48,1350,1344],{},{"title":18,"searchDepth":19,"depth":19,"links":1352},[],"Director of Storage, StreamNative & Apache Pulsar PMC Member","authors\u002Fhang","titaSDxZRJWAW0SkpJSq43NuDvps9XQ6gZIMSPCtUwo",{"id":1357,"title":311,"bioSummary":1358,"email":289,"extension":8,"image":1359,"linkedinUrl":289,"meta":1360,"position":1369,"stem":1370,"twitterUrl":1371,"__hash__":1372},"authors\u002Fauthors\u002Fneng-lu.md","Neng Lu is currently the Director of Platform at StreamNative, where he leads the engineering team in developing the StreamNative ONE Platform and the next-generation Ursa engine. As an Apache Pulsar Committer, he specializes in advancing Pulsar Functions and Pulsar IO Connectors, contributing to the evolution of real-time data streaming technologies. Prior to joining StreamNative, Neng was a Senior Software Engineer at Twitter, where he focused on the Heron project, a cutting-edge real-time computing framework. He holds a Master's degree in Computer Science from the University of California, Los Angeles (UCLA) and a Bachelor's degree from Zhejiang University.","\u002Fimgs\u002Fauthors\u002Fneng-lu.jpeg",{"body":1361},{"type":15,"value":1362,"toc":1367},[1363,1365],[48,1364,1358],{},[48,1366,1269],{},{"title":18,"searchDepth":19,"depth":19,"links":1368},[],"Director of Engineering, StreamNative","authors\u002Fneng-lu","https:\u002F\u002Ftwitter.com\u002Fnlu90","R1K8DYRoq92ZrwHOmKtJMRfm-cuTjXTqAv0Cc3Q9IM4",[1374,1382,1387],{"path":1375,"title":1376,"date":1377,"image":1378,"link":-1,"collection":1379,"resourceType":1380,"score":1381,"id":1375},"\u002Fblog\u002Fpulsar-newbie-guide-for-kafka-engineers-part-7-pulsar-security-for-kafka-admins","Pulsar Newbie Guide for Kafka Engineers (Part 7): Pulsar Security for Kafka Admins","2025-09-09","\u002Fimgs\u002Fblogs\u002F68c04c79ca9f71615177cbe3_SN-sm-Pulsar-for-Kafka-Engineers-series-7.png","blogs","Blog",1,{"path":1383,"title":1384,"date":1385,"image":1386,"link":-1,"collection":1379,"resourceType":1380,"score":1381,"id":1383},"\u002Fblog\u002Fpulsar-newbie-guide-for-kafka-engineers-part-6-schema-management-in-pulsar","Pulsar Newbie Guide for Kafka Engineers (Part 6): Schema Management in Pulsar","2025-09-04","\u002Fimgs\u002Fblogs\u002F68b9a6bd89138fb38f8e8af0_SN-sm-Pulsar-for-Kafka-Engineers-series-6.png",{"path":1388,"title":1389,"date":1390,"image":1391,"link":-1,"collection":1379,"resourceType":1380,"score":1381,"id":1388},"\u002Fblog\u002Fpulsar-newbie-guide-for-kafka-engineers-part-5-retention-ttl-compaction","Pulsar Newbie Guide for Kafka Engineers (Part 5): Retention, TTL & Compaction","2025-09-03","\u002Fimgs\u002Fblogs\u002F68b858ba57c99ef2f3be6848_SN-sm-Pulsar-for-Kafka-Engineers-series-5.png",1776409759840]