[{"data":1,"prerenderedAt":1787},["ShallowReactive",2],{"active-banner":3,"navbar-featured-partner-blog":24,"navbar-pricing-featured":306,"blog-\u002Fblog\u002Fextensible-load-balancer-pulsar-3-0":1086,"blog-authors-\u002Fblog\u002Fextensible-load-balancer-pulsar-3-0":1731,"related-\u002Fblog\u002Fextensible-load-balancer-pulsar-3-0":1766},{"id":4,"title":5,"date":6,"dismissible":7,"extension":8,"link":9,"link2":10,"linkText":11,"linkText2":12,"meta":13,"stem":21,"variant":22,"__hash__":23},"banners\u002Fbanners\u002Flakestream-ufk-launch.md","StreamNative Introduces Lakestream Architecture and Launches Native Kafka Service","2026-04-07",true,"md","\u002Fblog\u002Ffrom-streams-to-lakestreams","https:\u002F\u002Fconsole.streamnative.cloud\u002Fsignup?from=banner_lakestream-launch","Read Announcement","Sign Up Now",{"body":14},{"type":15,"value":16,"toc":17},"minimark",[],{"title":18,"searchDepth":19,"depth":19,"links":20},"",2,[],"banners\u002Flakestream-ufk-launch","default","zRueBGutATZB0ZnFFHwaEV7F0Di4tnZUHhgOiI4cu6k",{"id":25,"title":26,"authors":27,"body":29,"category":289,"createdAt":290,"date":291,"description":292,"extension":8,"featured":7,"image":293,"isDraft":294,"link":290,"meta":295,"navigation":7,"order":296,"path":297,"readingTime":298,"relatedResources":290,"seo":299,"stem":300,"tags":301,"__hash__":305},"blogs\u002Fblog\u002Fstreamnative-recognized-in-the-forrester-wave-streaming-data-platforms-2025.md","StreamNative Recognized as a Contender in The Forrester Wave™: Streaming Data Platforms, Q4 2025",[28],"David Kjerrumgaard",{"type":15,"value":30,"toc":276},[31,39,47,51,67,73,78,81,87,102,109,115,118,124,127,134,140,143,146,157,163,169,172,175,178,184,191,194,197,204,207,210,224,229,233,237,241,245,249,251,268,270],[32,33,35],"h3",{"id":34},"receives-highest-possible-scores-in-both-the-messaging-and-resource-optimization-criteria",[36,37,38],"em",{},"Receives Highest Possible Scores in BOTH the Messaging and Resource Optimization Criteria",[40,41,43],"h2",{"id":42},"introduction",[44,45,46],"strong",{},"Introduction",[48,49,50],"p",{},"Real-time data has become the backbone of modern innovation. As artificial intelligence (AI) and digital services demand instantaneous insights, organizations are realizing that streaming data is no longer optional – it's essential for delivering timely, context-rich experiences. StreamNative's data streaming platform is built precisely for this reality, ensuring data is immediate, reliable, and ready to power critical applications.",[48,52,53,54,63,64],{},"Today, we're excited to announce that Forrester Research has named StreamNative as a Contender in its evaluation, ",[55,56,58],"a",{"href":57},"\u002Freports\u002Frecognized-in-the-forrester-wave-tm-streaming-data-platforms-q4-2025",[36,59,60],{},[44,61,62],{},"The Forrester Wave™: Streaming Data Platforms, Q4 2025",". This report evaluated 15 top streaming data platform providers, and we're proud to share that ",[44,65,66],{},"StreamNative received the highest scores possible—5 out of 5—in both the Messaging and Resource Optimization criteria.",[48,68,69,70],{},"***Forrester's Take: ***",[36,71,72],{},"\"StreamNative is a good fit for enterprises that want an Apache Pulsar implementation that is also compatible with Kafka APIs.\"",[48,74,75],{},[36,76,77],{},"— The Forrester Wave™: Streaming Data Platforms, Q4 2025",[48,79,80],{},"Being recognized in the Forrester Wave is a proud milestone, and for us, it highlights how far StreamNative has come in enabling enterprises to unlock the power of real-time data. In the sections below, we'll dive into what we believe sets StreamNative apart—from our modern architecture and cloud-native design to our open-source foundation and real-time use cases—and how we see these strengths aligning with Forrester's findings.",[40,82,84],{"id":83},"trusted-by-industry-leaders",[44,85,86],{},"Trusted by Industry Leaders",[48,88,89,90,93,94,97,98,101],{},"Companies across industries are already leveraging StreamNative to drive real-time outcomes. Global enterprises like ",[44,91,92],{},"Cisco"," rely on StreamNative to handle massive IoT telemetry, supporting 245 million+ connected devices. Martech leaders such as ",[44,95,96],{},"Iterable"," process billions of events per day with StreamNative for hyper-personalized customer engagement. And in financial services, ",[44,99,100],{},"FICO"," trusts StreamNative to power its real-time fraud detection and analytics pipelines with a secure, scalable streaming backbone.",[48,103,104,105,108],{},"The Forrester report notes that, “",[36,106,107],{},"Customers appreciate the lower infrastructure costs that result from StreamNative’s cost-efficient, Kafka-compatible architecture. Customers note excellent support responsiveness…","”",[40,110,112],{"id":111},"modern-cloud-native-architecture-built-for-scale",[44,113,114],{},"Modern, Cloud-Native Architecture Built for Scale",[48,116,117],{},"From day one, StreamNative was designed with a modern architecture to meet the demanding scale and flexibility requirements of real-time data. Unlike legacy streaming systems that often rely on tightly coupled storage and compute, StreamNative's platform takes a cloud-native approach: it decouples these layers to enable elastic scalability and efficient resource utilization across any environment. The core is powered by Apache Pulsar—a distributed messaging and streaming engine—enhanced with multi-protocol support (including native Apache Kafka API compatibility) to unify diverse data streams under one roof. This means organizations can consolidate siloed messaging systems and handle both high-volume event streams and traditional message queues on a single platform, without sacrificing performance or reliability.",[48,119,120,121,108],{},"Forrester's evaluation described that “",[36,122,123],{},"StreamNative aims to provide a high-performance, multi-protocol streaming data platform: It uses Apache Pulsar with Kafka API compatibility to deliver cost-efficient, real-time applications for enterprises. It appeals to organizations that want a flexible, low-cost streaming solution, due to its focus on scalability and resource optimization, while its investments in Pulsar’s open-source ecosystem and performance optimization make it the primary platform for enterprises wishing to implement Pulsar.",[48,125,126],{},"Our cloud-first, leaderless architecture (with no single broker bottlenecks) and tiered storage model were built to maximize throughput and cost-efficiency for real-time workloads. By separating compute from storage and leveraging distributed object storage, StreamNative can retain huge volumes of event data indefinitely while keeping compute costs in check—effectively providing a flexible, low-cost streaming solution.",[48,128,129,130,133],{},"This modern design not only delivers high performance, but also ensures fault tolerance and geo-distribution out of the box, so enterprises can trust their streaming data is always available and durable. As Forrester’s evaluation noted, StreamNative ",[36,131,132],{},"\"excels at messaging and resource optimization\" and “Its platform supports use cases like real-time analytics and event-driven architectures with robust scalability.","” Our architecture provides the strong foundation that today's real-time applications demand, from ultra-fast data ingestion to seamless scale-out across hybrid and multi-cloud environments.",[40,135,137],{"id":136},"open-source-foundation-and-pulsar-expertise",[44,138,139],{},"Open Source Foundation and Pulsar Expertise",[48,141,142],{},"StreamNative's DNA is rooted in open source innovation. Our founders are the original creators of Apache Pulsar, and we've built our platform with the same open principles: freedom, flexibility, and community-driven innovation. For developers and data teams, this means adopting StreamNative comes with no proprietary lock-in—instead, you get a platform built on open standards and a thriving ecosystem. We offer broad API compatibility (Pulsar, Kafka, JMS, MQTT, and more) so that teams can work with familiar interfaces and integrate StreamNative into existing systems with ease.",[48,144,145],{},"StreamNative is the primary commercial contributor to the Apache Pulsar project and its surrounding ecosystem. We invest heavily in Pulsar's ongoing improvements our investments in Pulsar's open-source ecosystem and performance optimization bolster StreamNative's value. We also foster a vibrant community through initiatives like the Data Streaming Summit and free training resources.",[48,147,148,149,152,153,156],{},"Forrester's assessment noted that StreamNative’s “",[36,150,151],{},"events-driven agents, extensibility, and performance architecture are solid,","” and we're continuing to build on that foundation. ",[44,154,155],{},"We're actively investing in expanding our tooling for observability, governance, schema management, and developer productivity","—areas we recognize as critical for enterprise adoption and where we're committed to accelerating our roadmap.",[48,158,159,160],{},"Being open also means embracing an open ecosystem of technologies. StreamNative actively integrates with the tools and platforms that matter most to our users. We partner with industry leaders like Snowflake, Databricks, Google, and Ververica to ensure our streaming platform works seamlessly with data warehouses, lakehouse storage, and stream processing frameworks. Forrester’s evaluation observed that StreamNative’s ",[36,161,162],{},"\"investments in Pulsar’s open-source ecosystem and performance optimization make it the primary platform for enterprises wishing to implement Pulsar.\"",[40,164,166],{"id":165},"powering-real-time-use-cases-across-industries",[44,167,168],{},"Powering Real-Time Use Cases Across Industries",[48,170,171],{},"One of the greatest validations of StreamNative's approach is the success our customers are achieving with real-time data. StreamNative's platform is versatile and use-case agnostic—if an application demands high-volume, low-latency data movement, we can power it. This flexibility is why our customer base spans industries from finance and IoT to major automobile manufacturers and online gaming. The common thread is that these organizations need to process and react to data in milliseconds, and StreamNative is delivering the capabilities to make that possible.",[48,173,174],{},"Cisco uses StreamNative to underpin an IoT telemetry system of colossal scale, connecting hundreds of millions of devices and thousands of enterprise clients with real-time data streams. The platform's multi-tenant design and proven reliability allow Cisco to offer its customers a live feed of device data with unwavering confidence. In the financial sector, FICO has built streaming pipelines on StreamNative to detect fraud as transactions happen and to monitor systems in real time. With StreamNative's strong guarantees around message durability and ordering, FICO can catch anomalies or suspicious patterns within seconds. And in digital customer engagement, Iterable relies on StreamNative to process billions of events every day—clicks, views, purchases—so that marketers can trigger personalized campaigns instantly based on user behavior.",[48,176,177],{},"Our customers uniformly deal with mission-critical data streams, where downtime or delays are unacceptable. StreamNative's fault-tolerant, scalable infrastructure has proven equal to the task, handling scenarios like bursting to millions of events per second or seamlessly spanning multiple cloud regions. Forrester's report recognized StreamNative for supporting event-driven architectures with robust scalability—which for us is a reflection of our platform's ability to meet the most demanding enterprise requirements.",[40,179,181],{"id":180},"continuing-to-innovate-ursa-orca-and-the-road-ahead",[44,182,183],{},"Continuing to Innovate: Ursa, Orca, and the Road Ahead",[48,185,186,187,190],{},"While we are thrilled to be recognized in Forrester's Streaming Data Platforms Wave, we view this as just the beginning. StreamNative's vision has always been bold: to ",[44,188,189],{},"provide a unified platform that not only handles today's streaming needs but also anticipates the emerging requirements of tomorrow",".",[48,192,193],{},"One key area of focus is the convergence of streaming data with advanced analytics and AI. As Forrester points out in the report, technology leaders should look for platforms that natively integrate messaging, stream processing, and analytics to provide AI agents with real-time, contextualized information. We couldn't agree more. Our award-winning Ursa Engine and Orca Agent Engine are aimed at extending our platform up the stack—bridging the gap between data streams and data lakes, and between event streams and intelligent processing.",[48,195,196],{},"Our new Ursa Engine introduces a lakehouse-native approach to streaming: it can write events directly to table formats like Iceberg on cloud storage, eliminating entire classes of ETL jobs and making fresh data instantly available for analytics queries. By integrating streaming and lakehouse technologies, we help customers collapse data silos and accelerate their AI\u002FML pipelines.",[48,198,199,200,203],{},"Beyond analytics integration, we are also enhancing StreamNative with more out-of-the-box processing and governance capabilities. In the coming months, we plan to introduce new features for lightweight stream processing and transformation, making it easier to build reactive applications directly on the platform. We're also expanding our ecosystem of connectors and integrations, so that whether your data lands in Snowflake, Databricks, or an AI model, StreamNative will seamlessly feed it. ",[44,201,202],{},"We're investing significantly in enterprise features including security, schema registry, governance, and monitoring tooling","—capabilities that are essential for mission-critical deployments and where we're committed to continued improvement.",[48,205,206],{},"This recognition from Forrester energizes us to keep innovating at full speed. We're sharing this honor with our amazing customers, community, and partners who drive us forward every day. Your feedback and real-world challenges have helped shape StreamNative into what it is today, and together, we will shape the future of streaming data. Thank you for joining us on this journey—we're just getting started, and we can't wait to deliver even more value as we continue to evolve our platform. Onward to real-time everything!",[208,209],"hr",{},[32,211,213],{"id":212},"streamnative-in-the-forrester-wave-evaluation-findings",[44,214,215,216,223],{},"StreamNative in ",[44,217,218],{},[55,219,220],{"href":57},[44,221,222],{},"The Forrester Wave™",": Evaluation Findings",[225,226,228],"h5",{"id":227},"recognized-as-a-contender-among-15-streaming-data-platform-providers","• Recognized as a Contender among 15 streaming data platform providers",[225,230,232],{"id":231},"received-the-highest-scores-possible-50-in-both-the-messaging-and-resource-optimization-criteria","* Received the highest scores possible (5.0) in both the Messaging and Resource Optimization criteria",[225,234,236],{"id":235},"cited-as-the-primary-platform-for-enterprises-wishing-to-implement-pulsar","• Cited as the primary platform for enterprises wishing to implement Pulsar",[225,238,240],{"id":239},"noted-for-excelling-at-messaging-and-resource-optimization","• Noted for excelling at messaging and resource optimization",[225,242,244],{"id":243},"customers-cited-lower-infrastructure-costs-and-excellent-support-responsiveness","• Customers cited lower infrastructure costs and excellent support responsiveness",[225,246,248],{"id":247},"recognized-for-supporting-event-driven-architectures-with-robust-scalability","• Recognized for supporting event-driven architectures with robust scalability",[208,250],{},[252,253,255,256,259,260,190],"h6",{"id":254},"forrester-disclaimer-forrester-does-not-endorse-any-company-product-brand-or-service-included-in-its-research-publications-and-does-not-advise-any-person-to-select-the-products-or-services-of-any-company-or-brand-based-on-the-ratings-included-in-such-publications-information-is-based-on-the-best-available-resources-opinions-reflect-judgment-at-the-time-and-are-subject-to-change-for-more-information-read-about-forresters-objectivity-here","**Forrester Disclaimer: **",[36,257,258],{},"Forrester does not endorse any company, product, brand, or service included in its research publications and does not advise any person to select the products or services of any company or brand based on the ratings included in such publications. Information is based on the best available resources. Opinions reflect judgment at the time and are subject to change",". *For more information, read about Forrester’s objectivity *",[55,261,265],{"href":262,"rel":263},"https:\u002F\u002Fwww.forrester.com\u002Fabout-us\u002Fobjectivity\u002F",[264],"nofollow",[36,266,267],{},"here",[208,269],{},[252,271,273],{"id":272},"apache-apache-pulsar-apache-kafka-apache-flink-and-other-names-are-trademarks-of-the-apache-software-foundation-no-endorsement-by-apache-or-other-third-parties-is-implied",[36,274,275],{},"Apache®, Apache Pulsar®, Apache Kafka®, Apache Flink® and other names are trademarks of The Apache Software Foundation. No endorsement by Apache or other third parties is implied.",{"title":18,"searchDepth":19,"depth":19,"links":277},[278,280,281,282,283,284,285],{"id":34,"depth":279,"text":38},3,{"id":42,"depth":19,"text":46},{"id":83,"depth":19,"text":86},{"id":111,"depth":19,"text":114},{"id":136,"depth":19,"text":139},{"id":165,"depth":19,"text":168},{"id":180,"depth":19,"text":183,"children":286},[287],{"id":212,"depth":279,"text":288},"StreamNative in The Forrester Wave™: Evaluation Findings","Company",null,"2025-12-16","StreamNative is recognized in The Forrester Wave™: Streaming Data Platforms, Q4 2025. Discover why Forrester highlights StreamNative's high-performance messaging, efficient resource use, and cost-effective Kafka API compatibility for real-time innovation.","\u002Fimgs\u002Fblogs\u002F693bd36cf01b217dcb67278f_Streamnative_blog_thumbnail.png",false,{},0,"\u002Fblog\u002Fstreamnative-recognized-in-the-forrester-wave-streaming-data-platforms-2025","10 mins read",{"title":26,"description":292},"blog\u002Fstreamnative-recognized-in-the-forrester-wave-streaming-data-platforms-2025",[302,303,304],"Announcements","Real-Time","Forrester","sOeeJtEO3O-IIfTPJjY1AFOMawZ_rf8FOH8A98NEKgU",{"id":307,"title":308,"authors":309,"body":314,"category":1073,"createdAt":290,"date":1074,"description":1075,"extension":8,"featured":7,"image":1076,"isDraft":294,"link":290,"meta":1077,"navigation":7,"order":296,"path":1078,"readingTime":1079,"relatedResources":290,"seo":1080,"stem":1081,"tags":1082,"__hash__":1085},"blogs\u002Fblog\u002Fhow-we-run-a-5-gb-s-kafka-workload-for-just-50-per-hour.md","How We Run a 5 GB\u002Fs Kafka Workload for Just $50 per Hour",[310,311,312,313],"Matteo Meril","Neng Lu","Hang Chen","Penghui Li",{"type":15,"value":315,"toc":1043},[316,319,322,325,328,331,335,338,348,354,357,365,370,374,381,384,387,395,399,402,407,411,414,417,420,423,432,436,439,450,453,457,460,463,474,477,481,485,493,496,500,508,537,541,544,549,553,556,560,563,566,571,580,585,588,591,602,606,609,620,624,627,630,635,638,667,671,673,679,682,687,692,695,699,713,717,728,732,747,756,767,770,773,777,780,783,794,797,800,803,808,813,817,821,838,842,856,861,865,876,879,895,899,910,915,920,928,932,935,939,946,950,953,962,967,976,982,991,1000,1009,1018,1027,1035],[48,317,318],{},"The rise of DeepSeek has shaken the AI infrastructure market, forcing companies to confront the escalating costs of training and deploying AI models. But the real pressure point isn’t just compute—it’s data acquisition and ingestion costs.",[48,320,321],{},"As businesses rethink their AI cost-containment strategies, real-time data streaming is emerging as a critical enabler. The growing adoption of Kafka as a standard protocol has expanded cost-efficient options, allowing companies to optimize streaming analytics while keeping expenses in check.",[48,323,324],{},"Ursa, the data streaming engine powering StreamNative’s managed Kafka service, is built for this new reality. With its leaderless architecture and native lakehouse storage integration, Ursa eliminates costly inter-zone network traffic for data replication and client-to-broker communication while ensuring high availability at minimal operational cost.",[48,326,327],{},"In this blog post, we benchmarked the infrastructure cost and total cost of ownership (TCO) for running a 5GB\u002Fs Kafka workload across different Kafka vendors, including Redpanda, Confluent WarpStream, and AWS MSK. Our benchmark results show that Ursa can sustain 5GB\u002Fs Kafka workloads at just 5% of the cost of traditional streaming engines like Redpanda—making it the ideal solution for high-performance, cost-efficient ingestion and data streaming for data lakehouses and AI workloads.",[48,329,330],{},"Note: We also evaluated vanilla Kafka in our benchmark; however, for simplicity, we have focused our cost comparison on vendor solutions rather than self-managed deployments. That said, it is important to highlight that both Redpanda and vanilla Kafka use a leader-based data replication approach. In a data-intensive, network-bound workload like 5GB\u002Fs streaming, with the same machine type and replication factor, Redpanda and vanilla Kafka produced nearly identical cost profiles.",[40,332,334],{"id":333},"key-benchmark-findings","Key Benchmark Findings",[48,336,337],{},"Ursa delivered 5 GB\u002Fs of sustained throughput at an infrastructure cost of just $54 per hour. For comparison:",[339,340,341,345],"ul",{},[342,343,344],"li",{},"MSK: $303 per hour → 5.6x more expensive compared to Ursa",[342,346,347],{},"Redpanda: $988 per hour → 18x more expensive compared to Ursa",[48,349,350],{},[351,352],"img",{"alt":18,"src":353},"\u002Fimgs\u002Fblogs\u002F679c71b67d9046f26edc7977_AD_4nXfvTqyBNUBu2lObdkKAx-5UNkpNP8UYULLZyOcixE6z99VMZUUEsUqWjzexI7vjyNGRNSAUoM9smYvdTP55ctAhIbrs5lmQgcSVMWdaoigbWouCl95DVSQsxooY-qqfGcYqS4g4zA.png",[48,355,356],{},"Beyond infrastructure costs, when factoring in both storage pricing, vendor pricing and operational expenses, Ursa’s total cost of ownership (TCO) for a 5GB\u002Fs workload with a 7-day retention period is:",[339,358,359,362],{},[342,360,361],{},"50% cheaper than Confluent WarpStream",[342,363,364],{},"85% cheaper than MSK and Redpanda",[48,366,367],{},[351,368],{"alt":18,"src":369},"\u002Fimgs\u002Fblogs\u002F679c602d77e9c706de5343b8_AD_4nXeDv8rrv_C1CTCCiqYo1zpvlGYbdBk1r0VEqovAPu22iFMQZgh54Hfw9PBMLzM7jDFxKwAFDxbdG0np4XVk_tGsWhEKMloLRcmmea7lvueCx-0cFsyaE3Mya4Mxc1Dox95A6JEc.png",[40,371,373],{"id":372},"ursa-highly-cost-efficient-data-streaming-at-scale","Ursa: Highly Cost-Efficient Data Streaming at Scale",[48,375,376,380],{},[55,377,379],{"href":378},"\u002Fblog\u002Fursa-reimagine-apache-kafka-for-the-cost-conscious-data-streaming","Ursa"," is a next-generation data streaming engine designed to deliver high performance at a fraction of the cost of traditional disk-based solutions. It is fully compatible with Apache Kafka and Apache Pulsar APIs, while leveraging a leaderless, lakehouse-native architecture to maximize scalability, efficiency, and cost savings.",[48,382,383],{},"Ursa’s key innovation is separating storage from compute and decoupling metadata\u002Findex operations from data operations by utilizing cloud object storage (e.g., AWS S3) instead of costly inter-zone disk-based replication. It also employs open lakehouse formats (Iceberg and Delta Lake), enabling columnar compression to significantly reduce storage costs while maintaining durability and availability.",[48,385,386],{},"In contrast, traditional streaming systems—like Kafka and Redpanda—depend on leader-based architectures, which drive up inter-zone traffic costs due to replication and client communication. Ursa mitigates these costs by:",[339,388,389,392],{},[342,390,391],{},"Eliminating inter-zone traffic costs via a leaderless architecture.",[342,393,394],{},"Replacing costly inter-zone replication with direct writes to cloud storage using open lakehouse formats.",[40,396,398],{"id":397},"how-ursa-eliminates-inter-zone-traffic","How Ursa Eliminates Inter-Zone Traffic",[48,400,401],{},"Ursa minimizes inter-zone traffic by leveraging a leaderless architecture, which eliminates inter-zone communication between clients and brokers, and lakehouse-native storage, which removes the need for inter-zone data replication. This approach ensures high availability and scalability while avoiding unnecessary cross-zone data movement.",[48,403,404],{},[351,405],{"alt":18,"src":406},"\u002Fimgs\u002Fblogs\u002F679c602e21b3571bb7117dca_AD_4nXd7Oahc77NjRLNvA9clLt0tsyU6MrIqVibFYv5pW5giTIcCHPr3EA_yTGzfVEUIVO3VXK56qWK8zmBCp5lY0E_4nmlWIPFrHjtHylA5NhwELjn-UB0fLG2h_kbrxrc7Cs_edvveNA.png",[32,408,410],{"id":409},"leaderless-architecture","Leaderless architecture",[48,412,413],{},"Traditional streaming engines such as Kafka, Pulsar, or RedPanda rely on a leader-based model, where each partition is assigned to a single leader broker that handles all writes and reads.",[48,415,416],{},"Pros of Leader-Based Architectures:\n✔ Maintains message ordering via local sequence IDs\n✔ Delivers low latency and high performance through message caching",[48,418,419],{},"Cons of Leader-Based Architectures:\n✖ Throughput bottlenecked by a single broker per partition\n✖ Inter-zone traffic required for high availability in multi-AZ deployments",[48,421,422],{},"While Kafka and Pulsar offer partial solutions (e.g., reading from followers, shadow topics) to reduce read-related inter-zone traffic, producers still send data to a single leader.",[48,424,425,426,431],{},"Ursa removes the concept of topic ownership, allowing any broker in the cluster to handle reads or writes for any partition. The primary challenge—ensuring message ordering—is solved with ",[55,427,430],{"href":428,"rel":429},"https:\u002F\u002Fgithub.com\u002Fstreamnative\u002Foxia",[264],"Oxia",", a scalable metadata and index service created by StreamNative in 2022.",[32,433,435],{"id":434},"oxia-the-metadata-layer-enabling-leaderless-architecture","Oxia: The Metadata Layer Enabling Leaderless Architecture",[48,437,438],{},"Ensuring message ordering in a leaderless architecture is complex, but Ursa solves this with Oxia:",[339,440,441,444,447],{},[342,442,443],{},"Handles millions of metadata\u002Findex operations per second",[342,445,446],{},"Generates sequential IDs to maintain strict message ordering",[342,448,449],{},"Optimized for Kubernetes with horizontal scalability",[48,451,452],{},"Producers and consumers can connect to any broker within their local AZ, eliminating inter-zone traffic costs while maintaining performance through localized caching.",[32,454,456],{"id":455},"zero-interzone-data-replication","Zero interzone data replication",[48,458,459],{},"In most distributed systems, data replication from a leader (primary) to followers (replicas) is crucial for fault tolerance and availability. However, replication across zones can inflate infrastructure expenses substantially.",[48,461,462],{},"Ursa avoids these costs by writing data directly to cloud storage (e.g., AWS S3, Google GCS):",[339,464,465,468,471],{},[342,466,467],{},"Built-In Resilience: Cloud storage inherently offers high availability and fault tolerance without inter-zone traffic fees.",[342,469,470],{},"Tradeoff: Slightly higher latency (sub-second, with p99 at 500 milliseconds) compared to local disk\u002FEBS (single-digit to sub-100 milliseconds), in exchange for significantly lower costs (up to 10x lower).",[342,472,473],{},"Flexible Modes: Ursa is an addition to the classic BookKeeper-based engine, providing users with the flexibility to optimize for either cost or low latency based on their workload requirements.",[48,475,476],{},"By foregoing conventional replication, Ursa slashes inter-zone traffic costs and associated complexities—making it a compelling option for organizations seeking to balance high-performance data streaming with strict budget constraints.",[40,478,480],{"id":479},"how-we-ran-a-5-gbs-test-with-ursa","How We Ran a 5 GB\u002Fs Test with Ursa",[32,482,484],{"id":483},"ursa-cluster-deployment","Ursa Cluster Deployment",[339,486,487,490],{},[342,488,489],{},"9 brokers across 3 availability zones, each on m6i.8xlarge (Fixed 12.5 Gbps bandwidth, 32 vCPU cores, 128 GB memory).",[342,491,492],{},"Oxia cluster (metadata store) with 3 nodes of m6i.8xlarge, distributed across three availability zones (AZs).",[48,494,495],{},"During peak throughput (5 GB\u002Fs), each broker’s network usage was about 10 Gbps.",[32,497,499],{"id":498},"openmessaging-benchmark-workers-configuration","OpenMessaging Benchmark Workers & Configuration",[48,501,502,503,507],{},"The OpenMessaging Benchmark(OMB) Framework is a suite of tools that make it easy to benchmark distributed messaging systems in the cloud. Please check ",[55,504,505],{"href":505,"rel":506},"https:\u002F\u002Fopenmessaging.cloud\u002Fdocs\u002Fbenchmarks\u002F",[264]," for details.",[339,509,510,525,534],{},[342,511,512,513,518,519,524],{},"12 OMB workers: 6 for ",[55,514,517],{"href":515,"rel":516},"https:\u002F\u002Fgist.github.com\u002Fcodelipenghui\u002Fd1094122270775e4f1580947f80c5055",[264],"producers",", 6 for ",[55,520,523],{"href":521,"rel":522},"https:\u002F\u002Fgist.github.com\u002Fcodelipenghui\u002F06bada89381fb77a7862e1b4c1d8963d",[264],"consumers"," across 3 availability zones, on m6i.8xlarge instances. Each worker is configured with 12 CPU cores and 48 GB memory.",[342,526,527,528,533],{},"Sample YAML ",[55,529,532],{"href":530,"rel":531},"https:\u002F\u002Fgist.github.com\u002Fcodelipenghui\u002F204c1f26c4d44a218ae235bf2de99904",[264],"scripts"," provided for Kafka-compatible configuration and rate limits.",[342,535,536],{},"Achieved consistent 5 GB\u002Fs publish\u002Fsubscribe throughput.",[40,538,540],{"id":539},"ursa-benchmark-tests-results","Ursa Benchmark Tests & Results",[48,542,543],{},"The following diagram demonstrates that Ursa can consistently handle 5 GB\u002Fs of traffic, fully saturating the network across all broker nodes.",[48,545,546],{},[351,547],{"alt":18,"src":548},"\u002Fimgs\u002Fblogs\u002F679c602d7b261bac1113f7d6_AD_4nXdDPsRc3koXICiFF0bqSmGWbJt_RlUy4FE3ruuWOfbCfpcqZ1dejjqGbkaCJv2hQFL1nirRouBVRW2l5uMWBvY9naMqGB_wHcLI14dBM0f85TXhmdm3UxEv1yGX9Y4hf5FttSkZew.png",[40,550,552],{"id":551},"comparing-infrastructure-cost","Comparing Infrastructure Cost",[48,554,555],{},"This benchmark first evaluates infrastructure costs of running a 5 GB\u002Fs streaming workload (1:1 producer-to-consumer ratio) across different data streaming engines, including Ursa, Redpanda, and AWS MSK, with a focus on multi-AZ deployments to ensure a fair comparison.",[32,557,559],{"id":558},"test-setup-key-assumptions","Test Setup & Key Assumptions",[48,561,562],{},"All tests use multi-AZ configurations, with clusters and clients distributed across three AWS availability zones (AZs). Cluster size scales proportionally to the number of AZs, and rack-awareness is enabled for all engines to evenly distribute topic partitions and leaders.",[48,564,565],{},"To ensure a fair comparison, we selected the same machine type capable of fully utilizing both network and storage bandwidth for Ursa and Redpanda in this 5GB\u002Fs test:",[339,567,568],{},[342,569,570],{},"9 × m6i.8xlarge instances",[48,572,573,574,579],{},"However, MSK's storage bandwidth limits vary depending on the selected instance type, with the highest allowed limit capped at 1000 MiB\u002Fs per broker, according to",[55,575,578],{"href":576,"rel":577},"https:\u002F\u002Fdocs.aws.amazon.com\u002Fmsk\u002Flatest\u002Fdeveloperguide\u002Fmsk-provision-throughput-management.html#throughput-bottlenecks",[264]," AWS documentation",". Given this constraint, achieving 5 GB\u002Fs throughput with a replication factor of 3 required the following setup:",[339,581,582],{},[342,583,584],{},"15 × kafka.m7g.8xlarge (32 vCPUs, 128 GB memory, 15 Gbps network, 4000 GiB EBS).",[48,586,587],{},"This configuration was necessary to work around MSK's storage bandwidth limitations, ensuring a comparable cost basis to other evaluated streaming engines.",[48,589,590],{},"Additional key assumptions include:",[339,592,593,596,599],{},[342,594,595],{},"Inter-AZ producer traffic: For leader-based engines, two-thirds of producer-to-broker traffic crosses AZs due to leader distribution.",[342,597,598],{},"Consumer optimizations: Follower fetch is enabled across all tests, eliminating inter-AZ consumer traffic.",[342,600,601],{},"Storage cost exclusions: This benchmark only evaluates streaming costs, assuming no long-term data retention.",[32,603,605],{"id":604},"inter-broker-replication-costs","Inter-Broker Replication Costs",[48,607,608],{},"Inter-broker (cross-AZ) replication is a major cost driver for data streaming engines:",[339,610,611,614,617],{},[342,612,613],{},"RedPanda: Inter-broker replication is not free, leading to substantial costs when data must be copied across multiple availability zones.",[342,615,616],{},"AWS MSK: Inter-broker replication is free, but MSK instance pricing is significantly higher (e.g., $3.264 per hour for kafka.m7g.8xlarge vs $1.306 per hour for an on-demand m7g.8xlarge). The storage price of MSK is $0.10 per GB-month which is significantly higher than st1, which costs $0.045 per GB-month. Even though replication is free, client-to-broker traffic still incurs inter-AZ charges.",[342,618,619],{},"Ursa: No inter-broker replication costs due to its leaderless architecture, eliminating inter-zone replication costs entirely.",[32,621,623],{"id":622},"zone-affinity-reducing-inter-az-costs","Zone Affinity: Reducing Inter-AZ Costs",[48,625,626],{},"We evaluated zone affinity mechanisms to further reduce inter-AZ data transfer costs.",[48,628,629],{},"Consumers:",[339,631,632],{},[342,633,634],{},"Follower fetch is enabled across all tests, ensuring consumers fetch data from replicas in their local AZ—eliminating inter-zone consumer traffic except for metadata lookups",[48,636,637],{},"Producers:",[339,639,640,649,658],{},[342,641,642,643,648],{},"Kafka protocol lacks an easy way to enforce producer AZ affinity (though ",[55,644,647],{"href":645,"rel":646},"https:\u002F\u002Fcwiki.apache.org\u002Fconfluence\u002Fdisplay\u002FKAFKA\u002FKIP-1123:+Rack-aware+partitioning+for+Kafka+Producer",[264],"KIP-1123"," aims to address this). And it only works with the default partitioner (i.e., when no record partition or record key is specified).",[342,650,651,652,657],{},"Redpanda recently introduced ",[55,653,656],{"href":654,"rel":655},"https:\u002F\u002Fdocs.redpanda.com\u002Fredpanda-cloud\u002Fdevelop\u002Fproduce-data\u002Fleader-pinning\u002F",[264],"leader pinning",", but this only benefits setups where producers are confined to a single AZ—not applicable to our multi-AZ benchmark.",[342,659,660,661,666],{},"Ursa is the only system in this test with ",[55,662,665],{"href":663,"rel":664},"https:\u002F\u002Fdocs.streamnative.io\u002Fdocs\u002Fconfig-kafka-client#eliminate-cross-az-networking-traffic",[264],"built-in zone affinity for both producers and consumers",". It achieves this by embedding producer AZ information in client.id, allowing metadata lookups to route clients to local-AZ brokers, eliminating inter-AZ producer traffic.",[32,668,670],{"id":669},"cost-comparison-results","Cost Comparison Results",[48,672,337],{},[339,674,675,677],{},[342,676,344],{},[342,678,347],{},[48,680,681],{},"Ursa’s leaderless architecture, zone affinity, and native cloud storage integration deliver unparalleled cost efficiency, making it the most cost-effective choice for high-throughput data streaming workloads.",[48,683,684],{},[351,685],{"alt":18,"src":686},"\u002Fimgs\u002Fblogs\u002F679c72208198ca36a352f228_AD_4nXeeZuM8T-xBlD4Vf3j67K618n08qh8wIDLLtiLJG0ssA1Wj1V26u7wIDTX9sqLrtw8mB2c299dwzarGen62CG0Vh7nWstn5qbPGFcBaKJYEepTsLr5fHWv1U8uqbg8Y0UOK6fJ7.png",[48,688,689],{},[351,690],{"alt":18,"src":691},"\u002Fimgs\u002Fblogs\u002F679c625978031f40229de484_AD_4nXdLkLLJ30KKr-_A_rN1j8akVwBYacAWIPzWHoOReJF421890kfByZoQQxkLczihVSmiw5Q9J51-V9I2SEKITbwsYnANDDTlAVL5nQ_jfaHNTe9VEWhSoa7DZooCnilDYL6l6msmJg.png",[48,693,694],{},"The detailed infrastructure cost calculations for each data streaming engine are listed below:",[32,696,698],{"id":697},"streamnative-ursa","StreamNative - Ursa",[339,700,701,704,707,710],{},[342,702,703],{},"Server EC2 costs: 9 * $1.536\u002Fhr = $14",[342,705,706],{},"Client EC2 costs: 9 * $1.536\u002Fhr =$14",[342,708,709],{},"S3 write requests costs: 1350 r\u002Fs * $0.005\u002F1000r * 3600s = $24",[342,711,712],{},"S3 read requests costs: 1350 r\u002Fs * $0.0004\u002F1000r * 3600s = $2",[32,714,716],{"id":715},"aws-msk","AWS MSK",[339,718,719,722,725],{},[342,720,721],{},"Server EC2 costs: 15 * $3.264\u002Fhr = $49",[342,723,724],{},"Client side EC2 costs: 9 * $1.536\u002Fhr =$14",[342,726,727],{},"Interzone traffic - producer to broker: 5GB\u002Fs * ⅔ * $0.02\u002FG(in+out) * 3600 = $240",[32,729,731],{"id":730},"redpanda","RedPanda",[339,733,734,736,738,741,744],{},[342,735,703],{},[342,737,706],{},[342,739,740],{},"Interzone traffic - producer to broker: 5GB\u002Fs * ⅔ * $0.02\u002FGB(in+out) * 3600 = $240",[342,742,743],{},"Interzone traffic - replication: 10GB\u002Fs * $0.02\u002FGB(in+out) * 3600 = $720",[342,745,746],{},"Interzone traffic - broker to consumer: $0 (fetch from local zone)",[48,748,749,750,755],{},"Please note that we were unable to test ",[55,751,754],{"href":752,"rel":753},"https:\u002F\u002Fwww.redpanda.com\u002Fblog\u002Fcloud-topics-streaming-data-object-storage",[264],"Redpanda with Cloud Topics",", as it remains an announced but unreleased feature and is not yet available for evaluation. Based on the limited information available, while Cloud Topics may help optimize inter-zone data replication costs, producers still need to traverse inter-availability zones to connect to the topic partition owners and incur inter-zone traffic costs of up to $240 per hour.",[339,757,758,764],{},[342,759,760,763],{},[55,761,647],{"href":645,"rel":762},[264]," (when implemented) will help mitigate producer-to-broker inter-zone traffic, but it is not yet available. And it only works with the default partitioner (no record partition or key is specified).",[342,765,766],{},"Redpanda’s leader pinning helps only when all producers for the pinned topic are confined to a single AZ. In multi-AZ environments (like our benchmark), inter-zone producer traffic remains unavoidable.",[48,768,769],{},"Additionally, Redpanda’s Cloud Topics architecture is not documented publicly. Their blog mentions \"leader placement rules to optimize produce latency and ingress cost,\" but it is unclear whether this represents a shift away from a leader-based architecture or if it uses techniques similar to Ursa’s zone-aware approach.",[48,771,772],{},"We may revisit this comparison as more details become available.",[40,774,776],{"id":775},"comparing-total-cost-of-ownership","Comparing Total Cost of Ownership",[48,778,779],{},"As highlighted earlier, with a BYOC Ursa setup, you can achieve 5 GB\u002Fs throughput at just 5% of the infrastructure cost of a traditional leader-based data streaming engine, such as Kafka or RedPanda, while managing the infrastructure yourself. This significant cost reduction is enabled by Ursa’s leaderless architecture and lakehouse-native storage design, which eliminate overhead costs such as inter-zone traffic and leader-based data replication. By leveraging a lakehouse-native, leaderless architecture, Ursa reduces resource requirements, enabling you to handle high data throughput efficiently and at a fraction of the cost of RedPanda.",[48,781,782],{},"Now, let’s examine the total cost comparison, evaluating Ursa alongside other vendors, including those that have adopted a leaderless architecture (e.g., Confluent WarpStream). This comparison is based on a 5GB\u002Fs workload with a 7-day retention period, factoring in both storage cost and vendor costs Here are the key findings:",[339,784,785,788,791],{},[342,786,787],{},"Ursa ($164,353\u002Fmonth) is: 50% cheaper than Confluent WarpStream ($337,068\u002Fmonth)",[342,789,790],{},"85% cheaper than AWS MSK ($1,115,251\u002Fmonth)",[342,792,793],{},"86% cheaper than Redpanda ($1,202,853\u002Fmonth)",[48,795,796],{},"In addition to Ursa’s architectural advantages—eliminating most inter-AZ traffic and leveraging lakehouse storage for cost-effective data retention—it also adopts a more fair and cost-efficient pricing model: Elastic Throughput-based pricing. This approach aligns costs with actual usage, avoiding unnecessary overhead.",[48,798,799],{},"Unlike WarpStream, which charges for both storage and throughput, Ursa ensures that customers only pay for the throughput they actively use. Ursa’s pricing is based on compressed data sent by clients, meaning the more data compressed on the client side, the lower the cost. In contrast, WarpStream prices are based on uncompressed data, unfairly inflating expenses and failing to incentivize customers to optimize their client applications.",[48,801,802],{},"This distinction is crucial, as compressed data reduces both storage and network costs, making Ursa’s pricing model not only more cost-effective but also more transparent and predictable.",[48,804,805],{},[351,806],{"alt":18,"src":807},"\u002Fimgs\u002Fblogs\u002F679c602d194800c9206d9d58_AD_4nXcFlf755xgyz7htxhMhBV5fGrsxy642mQNodt61DTok_z1dwkw5A6lkO5hatXVneCaB0anbZPAyvLI3MlIMuQEYLEACHHvQMOr5UfaB37dfzkdqewDEvcT-20VGd_zzvJsuA00zGA.png",[48,809,810],{},[351,811],{"alt":18,"src":812},"\u002Fimgs\u002Fblogs\u002F679c62594e9c2e629fae73aa_AD_4nXeU6cOgItnjLsEZCOf13TEvMY_SHWWIxYP2OYUj-B1GUPyWO78OG08K_v03hwYSVcg06f9dqDiGmdwy76vynjmiDGL5bluZ5_XF4nSU_r59oOZdfViXndXt6s11vVOY7qwfZN8v.png",[32,814,816],{"id":815},"cost-breakdown","Cost Breakdown",[818,819,820],"h4",{"id":697},"StreamNative – Ursa",[339,822,823,826,829,832,835],{},[342,824,825],{},"EC2 (Server): 9 × $1.536\u002Fhr × 24 hr × 30 days = $9,953.28",[342,827,828],{},"S3 Write Requests: 1,350 r\u002Fs × $0.005\u002F1,000 r × 3,600 s × 24 hr × 30 days = $17,496",[342,830,831],{},"S3 Read Requests: 1,350 r\u002Fs × $0.0004\u002F1,000 r × 3,600 s × 24 hr × 30 days = $1,400",[342,833,834],{},"S3 Storage Costs: 5 GB\u002Fs × $0.021\u002FGB × 3,600 s × 24 hr × 7 days = $63,504",[342,836,837],{},"Vendor Cost: 200 ETU × $0.50\u002Fhr × 24 hr × 30 days = $72,000",[818,839,841],{"id":840},"warpstream","WarpStream",[339,843,844,847],{},[342,845,846],{},"Based on WarpStream’s pricing calculator (as of January 29, 2025), we assume a 4:1 client data compression ratio, meaning 20 GB\u002Fs of uncompressed data translates to 5 GB\u002Fs of compressed data.",[342,848,849,850,855],{},"It's important to note that WarpStream’s pricing structure has fluctuated frequently throughout January. We observed the cost reported by their calculator changing from $409,644 per month to $337,068 per month. This variability has been previously highlighted in the blog post “",[55,851,854],{"href":852,"rel":853},"https:\u002F\u002Fbigdata.2minutestreaming.com\u002Fp\u002Fthe-brutal-truth-about-apache-kafka-cost-calculators",[264],"The Brutal Truth About Kafka Cost Calculators","”. To ensure transparency, we have documented the pricing as of January 29, 2025.",[48,857,858],{},[351,859],{"alt":18,"src":860},"\u002Fimgs\u002Fblogs\u002F679c602e42713e0028e9af5e_AD_4nXcu5_VWTLu9jRYs6zX1MBAOtLQEo5gyfNSWPcbpnQHXTa8qNCFAXezRR2E8daygzYTTwd4dhJjaLaLM8C6y_3OGbu2NS7pdvEv3a8-ptNKOg7AeKnYqPQCAYvQ5EuxzuI3JYIvY.png",[818,862,864],{"id":863},"msk","MSK",[339,866,867,870,873],{},[342,868,869],{},"EC2 (Server): 15 * $3.264\u002Fhr × 24 hr × 30 days = $35,251",[342,871,872],{},"Interzone Traffic (Client-Server): 5 GB\u002Fs × ⅔ × $0.02\u002FGB (in+out) × 3,600 s × 24 hr × 30 days = $172,800",[342,874,875],{},"Storage: 5 GB\u002Fs × $0.1\u002FGB-month × 3,600 s × 24 hr × 7 days * 3 replicas = $907,200",[818,877,731],{"id":878},"redpanda-1",[339,880,881,884,886,889,892],{},[342,882,883],{},"EC2 (Server): 9 × $1.536\u002Fhr × 24 hr × 30 days = $9953",[342,885,872],{},[342,887,888],{},"Interzone Traffic (Replication): 5 GB\u002Fs × 2 × $0.02\u002FGB (in+out) × 3,600 s × 24 hr × 30 days = $518,400",[342,890,891],{},"Storage: 5 GB\u002Fs × $0.045\u002FGB-month(st1) × 3,600 s × 24 hr × 7 days * 3 replicas = $408,240",[342,893,894],{},"Vendor Cost: $93,333 per month (based on limited information. See additional notes below).",[818,896,898],{"id":897},"additional-notes","Additional Notes",[339,900,901],{},[342,902,903,904,909],{},"Redpanda does not publicly disclose its BYOC pricing, making it difficult to accurately assess its total costs. We refer to information from the whitepaper “",[55,905,908],{"href":906,"rel":907},"https:\u002F\u002Fwww.redpanda.com\u002Fresources\u002Fredpanda-vs-confluent-performance-tco-benchmark-report#form",[264],"Redpanda vs. Confluent: A Performance and TCO Benchmark Report by McKnight Consulting Group.","” for estimation purposes. Based on the Tier-8 pricing model in the whitepaper,  the estimated cost to support a 5GB\u002Fs workload would be $1.12 million per year ($93,333 per month). However, since this calculation is based on an estimation, we will revisit and refine the cost assessment once Redpanda publishes its BYOC pricing.",[48,911,912],{},[351,913],{"alt":18,"src":914},"\u002Fimgs\u002Fblogs\u002F679c602dc8a9859eed89a0ef_AD_4nXdbcO8vsNNPy4GtkNLlmNKf22fjxRvzLzH7CtOna1L08sTbvnZx3HhufeFqc1w4K2gEF7lxO2IR5supotxebAiGnA07Qa8Yr3Rd1pVK2LYKK4WurlJGwgdwwucZIFoF-N_2oBjY.png",[48,916,917],{},[351,918],{"alt":18,"src":919},"\u002Fimgs\u002Fblogs\u002F679c602d6bc1c2287e012540_AD_4nXfcHZnLfjbjIr3ZAgoQXT9dwP3aQCOQPmGZZJUtpNZSwE6qY6M3yehIaBxCwxEIeu5PVdUPY0zhyjnow26YfgjdYgSG4GnV9ibxu0YWTIpwng6z_F6FUGJMpERMKtpsFESzXSN_Sw.png",[339,921,922,925],{},[342,923,924],{},"When estimating the storage costs for Kafka and Redpanda, we assume the use of HDD storage at $0.045\u002FGB, based on the premise that both systems can fully utilize disk bandwidth without incurring the higher costs associated with GP2 or GP3 volumes. However, in practice, many users opt for GP2 or GP3, significantly increasing the total storage cost for Kafka and Redpanda.",[342,926,927],{},"Unlike disk-based solutions, S3 storage does not require capacity preallocation—Ursa only incurs costs for the actual data stored. This contrasts with Kafka and Redpanda, where preallocating storage can drive up expenses. As a result, the real-world storage costs for Kafka and Redpanda are often 50% higher than the estimates above.",[40,929,931],{"id":930},"conclusion","Conclusion",[48,933,934],{},"Ursa represents a transformative shift in streaming data infrastructure, offering cost efficiency, scalability, and flexibility without compromising durability or reliability. By leveraging a leaderless architecture and eliminating inter-zone data replication, Ursa reduces total cost of ownership by over 90% compared to traditional leader-based streaming engines like Kafka and Redpanda. Its direct integration with cloud storage and scalable metadata & index management via Oxia ensure high availability and simplified infrastructure management.",[32,936,938],{"id":937},"balancing-latency-and-cost","Balancing Latency and Cost",[48,940,941,945],{},[55,942,944],{"href":943},"\u002Fblog\u002Fcap-theorem-for-data-streaming","Ursa trades off slightly higher latency for ultra low cost",", making it an ideal choice for the majority of streaming workloads, especially those that prioritize throughput and cost savings over ultra-low latency. Meanwhile, StreamNative’s BookKeeper-based engine remains the preferred solution for real-time, latency-sensitive applications. By combining these two approaches, StreamNative empowers customers with the flexibility to choose the right engine for their specific needs—whether it's maximizing cost savings or achieving ultra low-latency real-time performance.",[32,947,949],{"id":948},"the-future-of-streaming-infrastructure","The Future of Streaming Infrastructure",[48,951,952],{},"In an era where data fuels AI, analytics, and real-time decision-making, managing infrastructure costs is critical to sustaining innovation. Ursa is not just a cost-cutting alternative—it is a forward-thinking, lakehouse-native platform that redefines how modern data streaming infrastructure should be built and operated.",[48,954,955,956,961],{},"Whether your priority is reducing costs, improving flexibility, or ingesting massive data into lakehouses, Ursa delivers a future-proof solution for the evolving demands of real-time data streaming. ",[55,957,960],{"href":958,"rel":959},"https:\u002F\u002Fconsole.streamnative.cloud\u002F",[264],"Get started"," with StreamNative Ursa today!",[963,964,966],"h1",{"id":965},"references","References",[48,968,969,972,973],{},[970,971,430],"span",{}," ",[55,974,975],{"href":975},"\u002Fblog\u002Fintroducing-oxia-scalable-metadata-and-coordination",[48,977,978,972,980],{},[970,979,379],{},[55,981,378],{"href":378},[48,983,984,972,987],{},[970,985,986],{},"StreamNative pricing",[55,988,989],{"href":989,"rel":990},"https:\u002F\u002Fdocs.streamnative.io\u002Fdocs\u002Fbilling-overview",[264],[48,992,993,972,996],{},[970,994,995],{},"WarpStream pricing",[55,997,998],{"href":998,"rel":999},"https:\u002F\u002Fwww.warpstream.com\u002Fpricing#pricingfaqs",[264],[48,1001,1002,972,1005],{},[970,1003,1004],{},"AWS S3 pricing",[55,1006,1007],{"href":1007,"rel":1008},"https:\u002F\u002Faws.amazon.com\u002Fs3\u002Fpricing\u002F",[264],[48,1010,1011,972,1014],{},[970,1012,1013],{},"AWS EBS pricing",[55,1015,1016],{"href":1016,"rel":1017},"https:\u002F\u002Faws.amazon.com\u002Febs\u002Fpricing\u002F",[264],[48,1019,1020,972,1023],{},[970,1021,1022],{},"AWS MSK pricing",[55,1024,1025],{"href":1025,"rel":1026},"https:\u002F\u002Faws.amazon.com\u002Fmsk\u002Fpricing\u002F",[264],[48,1028,1029,972,1032],{},[970,1030,1031],{},"The Brutal Truth about Kafka Cost Calculators",[55,1033,852],{"href":852,"rel":1034},[264],[48,1036,1037,972,1040],{},[970,1038,1039],{},"Redpanda vs. Confluent: A Performance and TCO Benchmark Report by McKnight Consulting Group",[55,1041,906],{"href":906,"rel":1042},[264],{"title":18,"searchDepth":19,"depth":19,"links":1044},[1045,1046,1047,1052,1056,1057,1066,1069],{"id":333,"depth":19,"text":334},{"id":372,"depth":19,"text":373},{"id":397,"depth":19,"text":398,"children":1048},[1049,1050,1051],{"id":409,"depth":279,"text":410},{"id":434,"depth":279,"text":435},{"id":455,"depth":279,"text":456},{"id":479,"depth":19,"text":480,"children":1053},[1054,1055],{"id":483,"depth":279,"text":484},{"id":498,"depth":279,"text":499},{"id":539,"depth":19,"text":540},{"id":551,"depth":19,"text":552,"children":1058},[1059,1060,1061,1062,1063,1064,1065],{"id":558,"depth":279,"text":559},{"id":604,"depth":279,"text":605},{"id":622,"depth":279,"text":623},{"id":669,"depth":279,"text":670},{"id":697,"depth":279,"text":698},{"id":715,"depth":279,"text":716},{"id":730,"depth":279,"text":731},{"id":775,"depth":19,"text":776,"children":1067},[1068],{"id":815,"depth":279,"text":816},{"id":930,"depth":19,"text":931,"children":1070},[1071,1072],{"id":937,"depth":279,"text":938},{"id":948,"depth":279,"text":949},"StreamNative Cloud","2025-01-31","Discover how Ursa achieves 5GB\u002Fs Kafka workloads at just 5% of the cost of traditional streaming engines like Redpanda and AWS MSK. See our benchmark results comparing infrastructure costs, total cost of ownership (TCO), and performance across leading Kafka vendors.","\u002Fimgs\u002Fblogs\u002F679c6593d25099b1cdcec4ca_image-31.png",{},"\u002Fblog\u002Fhow-we-run-a-5-gb-s-kafka-workload-for-just-50-per-hour","30 min",{"title":308,"description":1075},"blog\u002Fhow-we-run-a-5-gb-s-kafka-workload-for-just-50-per-hour",[1083,1084,303],"TCO","Apache Kafka","A0o_2xdJiLI6rf6xj4RKsxJNo_A6QN2fYzCp6gaLrFw",{"id":1087,"title":1088,"authors":1089,"body":1092,"category":1721,"createdAt":290,"date":1722,"description":1723,"extension":8,"featured":294,"image":290,"isDraft":294,"link":290,"meta":1724,"navigation":7,"order":296,"path":1725,"readingTime":1726,"relatedResources":290,"seo":1727,"stem":1728,"tags":1729,"__hash__":1730},"blogs\u002Fblog\u002Fextensible-load-balancer-pulsar-3-0.md","Introducing Extensible Load Balancer in Pulsar 3.0",[1090,1091],"Heesung Sohn","Kai Wang",{"type":15,"value":1093,"toc":1703},[1094,1106,1110,1119,1123,1126,1129,1138,1142,1145,1148,1151,1154,1157,1160,1164,1167,1170,1173,1176,1179,1182,1186,1195,1199,1208,1211,1235,1249,1252,1256,1261,1264,1268,1273,1276,1279,1288,1297,1300,1304,1312,1321,1324,1335,1340,1343,1346,1349,1352,1356,1359,1368,1371,1379,1388,1391,1395,1398,1403,1406,1409,1414,1417,1420,1425,1428,1431,1437,1440,1443,1446,1450,1453,1456,1459,1462,1467,1470,1473,1478,1481,1489,1491,1494,1497,1502,1505,1513,1515,1518,1521,1524,1527,1532,1534,1537,1540,1545,1548,1551,1559,1562,1564,1572,1574,1577,1580,1582,1587,1589,1594,1596,1604,1607,1610,1618,1621,1623,1626,1631,1638,1641,1646,1649,1657,1660,1668,1671,1676,1678,1681,1689,1698,1701],[1095,1096,1097],"blockquote",{},[48,1098,1099,1100,1105],{},"If you use the StreamNative Platform, refer to ",[55,1101,1104],{"href":1102,"rel":1103},"https:\u002F\u002Fdocs.streamnative.io\u002Fplatform\u002Fbroker-lb",[264],"this guide"," for steps to activate or update the Extensible Load Balancer. For those on the StreamNative Cloud, please reach out to the support team for help.",[40,1107,1109],{"id":1108},"intro","Intro",[48,1111,1112,1113,1118],{},"We are thrilled to introduce our latest addition to the Apache Pulsar version 3.0, ",[55,1114,1117],{"href":1115,"rel":1116},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fissues\u002F16691",[264],"Extensible Load Balancer",", which improves the existing Pulsar Broker Load Balancer. For those seeking more details, in this blog, we're sharing the specifics of the enhancements and the obstacles we've overcome during the implementation process.",[32,1120,1122],{"id":1121},"what-is-the-pulsar-broker-load-balancer","What is the Pulsar Broker Load Balancer?",[48,1124,1125],{},"The Pulsar Broker Load Balancer is a component within the Apache Pulsar messaging system. The Pulsar’s compute-storage separation architecture enables the Pulsar Broker Load Balancer to seamlessly balance groups(bundles*) of topic sessions among brokers without involving message copies. This helps ensure efficient broker resource utilization, prevents individual brokers' overloading or underloading, and provides fault tolerance by promptly redistributing the orphan workload to available brokers.",[48,1127,1128],{},"Topics are grouped into bundles in Pulsar, and Bundle is the broker load balancer unit.",[48,1130,1131,1132,1137],{},"The Pulsar community has recently made notable improvements to its ",[55,1133,1136],{"href":1134,"rel":1135},"https:\u002F\u002Fpulsar.apache.org\u002Fdocs\u002F3.1.x\u002Fconcepts-broker-load-balancing-overview\u002F",[264],"load balancer documentation",". We recommend looking at the updated documentation if you're interested in this topic.",[32,1139,1141],{"id":1140},"why-do-we-introduce-a-new-load-balancer","Why do we introduce a new load balancer?",[48,1143,1144],{},"Legacy Maintenance Issues: Over time, the old load balancer's architecture introduced a long-standing maintenance challenge due to its historical design decisions. The old load balancer's design might not have been as modular as desired, making introducing new architectures, strategies, or logic challenging without affecting the existing functionality. This lack of modularity could hinder experimenting with improvements and innovations. Keeping up with maintenance might have become increasingly difficult, leading to the need for a more modern and manageable solution. This complexity could hinder implementing new features or fixing issues quickly.",[48,1146,1147],{},"Scalability Issues: As Pulsar clusters grew with more brokers and topics, the load balancer faced challenges in efficiently distributing metadata(including load balance data). The mechanism for replicating load data across brokers via metadata store(e.g. ZooKeeper) watchers became less scalable, resulting in potential performance bottlenecks and increased replication overhead.",[48,1149,1150],{},"Load Balancing Strategy: The previous load balancing strategy might have needed to have been more optimal for evenly distributing the workload, especially when dealing with dynamic load changes in adding or removing brokers.",[48,1152,1153],{},"Topic Availability During Unloading: The old load balancer might have led to resource access conflicts, causing longer temporary unavailability of topics during the unloading process, affecting the user experience and resource utilization.",[48,1155,1156],{},"Centralized Decision Making: Only the leader broker makes load balance decisions in the previous load balancer. This centralized approach could create bottlenecks and limit the system's ability to distribute the workload efficiently.",[48,1158,1159],{},"Operation: Sometimes, debugging the load balance decisions could have been clearer. Observability needs to be improved.",[32,1161,1163],{"id":1162},"how-do-we-solve-the-problems-with-the-new-load-balancer","How do we solve the problems with the New Load Balancer?",[48,1165,1166],{},"Legacy Maintenance Challenges: The new load balancer is written with new classes with a cleaner design. This will facilitate easier maintenance, updates, and the integration of new features without disrupting the existing functionality. This enhances the system's manageability and adaptability over time.",[48,1168,1169],{},"Scalability Issues: To overcome scalability challenges, the new load balancer stores load and ownership data in Pulsar native topics and reads them via Pulsar table views. This reduces replication overhead and potential bottlenecks, ensuring smooth load data distribution even in larger Pulsar clusters.",[48,1171,1172],{},"Load Balancing Strategy: All load balancing strategies(assignment, unloading, and splitting) are revisited with the new load balancer to ensure better workload distribution. It adapts to dynamic changes and efficiently handles new broker additions and deletions, resulting in a more balanced and optimized distribution of tasks. Load balance operations and states’ idempotency have been revisited when retrying upon failures.",[48,1174,1175],{},"Topic Availability During Unloading: The new load balancer minimizes topic unavailability during unloading by pre-assigning the owner broker and gracefully transferring ownership with the bundle transfer option. This minimizes resource access conflicts and reduces temporary topic downtime, enhancing user experience.",[48,1177,1178],{},"Centralized Decision Making: The new load balancer explores decentralized decision-making (assignment and splitting), distributing load balance decisions to local brokers as much as possible rather than relying solely on a central leader. This minimizes bottlenecks, enabling more efficient and distributed workload management.",[48,1180,1181],{},"Operation: Besides, the new load balancer also introduces a new set of metrics and load balancer debug-mode dynamic config to print more useful load balance decisions in the logs.",[32,1183,1185],{"id":1184},"how-do-we-enable-the-new-load-balancer","How do we enable the New Load Balancer?",[48,1187,1188,1189,1194],{},"The community updated the ",[55,1190,1193],{"href":1191,"rel":1192},"https:\u002F\u002Fpulsar.apache.org\u002Fdocs\u002Fnext\u002Fconcepts-broker-load-balancing-migration\u002F",[264],"load balancer migration steps"," on the Pulsar website to explain how to migrate from the modular load balancer to the extensible load balancer and vice versa.",[40,1196,1198],{"id":1197},"extensible-load-balancer-design","Extensible Load Balancer Design",[48,1200,1201,1202,1207],{},"To summarize, the ",[55,1203,1206],{"href":1204,"rel":1205},"https:\u002F\u002Fpulsar.apache.org\u002Fdocs\u002Fnext\u002Fconcepts-broker-load-balancing-types\u002F",[264],"modular (current) and extensible (new) load balancers"," implement similar load balancing functionalities with different system designs.",[48,1209,1210],{},"For example, they both employ a similar approach to distributing data loads among brokers, including:",[339,1212,1213,1221,1228],{},[342,1214,1215,1216],{},"Dynamic ",[55,1217,1220],{"href":1218,"rel":1219},"https:\u002F\u002Fpulsar.apache.org\u002Fdocs\u002Fnext\u002Fconcepts-broker-load-balancing-concepts\u002F#bundle-assignment",[264],"bundle-broker assignment",[342,1222,1215,1223],{},[55,1224,1227],{"href":1225,"rel":1226},"https:\u002F\u002Fpulsar.apache.org\u002Fdocs\u002Fnext\u002Fconcepts-broker-load-balancing-concepts\u002F#bundle-splitting",[264],"bundle splitting",[342,1229,1215,1230],{},[55,1231,1234],{"href":1232,"rel":1233},"https:\u002F\u002Fpulsar.apache.org\u002Fdocs\u002Fnext\u002Fconcepts-broker-load-balancing-concepts\u002F#bundle-unloading",[264],"bundle unloading (shedding)",[48,1236,1237,1238,1243,1244,190],{},"However, for bundle ownership and load data stores, the modular load balancer uses a configurable metadata store (e.g., ZooKeeper), whereas the extensible load balancer uses Pulsar native ",[55,1239,1242],{"href":1240,"rel":1241},"https:\u002F\u002Fpulsar.apache.org\u002Fdocs\u002Fnext\u002Fconcepts-messaging\u002F#system-topic",[264],"System topics"," and ",[55,1245,1248],{"href":1246,"rel":1247},"https:\u002F\u002Fpulsar.apache.org\u002Fdocs\u002Fnext\u002Fconcepts-clients\u002F#tableview",[264],"Table views",[48,1250,1251],{},"Table View has been introduced to Pulsar since 2.10, which provides a continuously updated key-value map view of the compacted topic data. This innovation greatly simplifies the new load balancer’s data architecture since each broker needs to publish load data to non-persistent (in-memory) system topics and replicate the latest views on table views. Similarly, for bundle ownership data, each broker can publish the ownership change messages to a persistent system topic and replicate the latest views on the table views.",[32,1253,1255],{"id":1254},"load-data-flow","Load Data Flow",[48,1257,1258],{},[351,1259],{"alt":18,"src":1260},"\u002Fimgs\u002Fblogs\u002F650df06a725d024ec33df194_179900738-b492415f-713a-4860-84ef-ab2aa8577240.png",[48,1262,1263],{},"The exchange of load data holds significant importance in achieving optimal load balancing, as incorrect or sluggish load data can negatively affect balancing efficiency. In this new design, brokers periodically share their broker load and top k bundle load data by publishing them to separate in-memory system topics. Each broker utilizes this broker load data for assignments and its local bundle load data for splitting, while the leader broker triggers the global bundle unloading based on both global broker and bundle load data. This new design decouples load data stores depending on the use cases to clean the data model and ensure the modularity of the load-balancing system.",[32,1265,1267],{"id":1266},"bundle-state-channel","Bundle State Channel",[48,1269,1270],{},[351,1271],{"alt":18,"src":1272},"\u002Fimgs\u002Fblogs\u002F650df08022183662e00eea8e_220518178-dadb7c34-f4c2-45ec-a85c-fa9f2ab1b2c3.png",[48,1274,1275],{},"The new load balancer introduced a bundle state machine like the above to define the possible states and transitions in the bundle(group of topics) life cycle. Also, to communicate these state changes across brokers and react to them, we introduced an event-source channel, Bundle State Channel, where each actor (broker) broadcasts messages containing these state transitions to the system topic and accordingly plays the roles upon received. Since these state changes persist in the system topic, it can ensure persistence, (eventual) consistency, and idempotency of the bundle state changes, even after failing and retrying.",[48,1277,1278],{},"Managing bundle ownership and resolving conflicts among brokers presents challenges. This complexity is heightened when multiple brokers concurrently assign ownership over the same bundle, or when such assignments occur during operations like bundle splitting or unloading. Effective conflict resolution strategies are pivotal in maintaining system integrity. Several approaches are available:",[1280,1281,1282,1285],"ol",{},[342,1283,1284],{},"Centralized Leadership Model: This method designates a singular leader among the brokers to oversee conflict resolution. The leader takes charge of resolving ownership conflicts and ensuring uniform state transitions. While this centralizes conflict resolution, it introduces the potential for a single point of failure and potential bottlenecks if the leader becomes overwhelmed.",[342,1286,1287],{},"Decentralized Approach: An alternative is a decentralized model, wherein individual brokers incorporate the identical conflict resolution mechanism. The brokers algorithmically deduce a consistent pathway for state transitions at the cost of additional messages on each broker.",[48,1289,1290,1291,1296],{},"The latter approach is pursued in the present implementation to circumvent reliance on the single leader. This involves embedding conflict resolution logic in each broker with the benefits of the “early broadcast” to defer the client lookups until the ownership is finalized. This could prevent clients from retrying lookups redundantly in the middle of bundle state changes. Also, this conflict resolution logic is straightforward to place on each broker without auxiliary metadata — given the linearized message sequence of this system topic, any message with a valid state transition and version ID will be accepted; otherwise, rejected. To generalize this custom conflict resolution strategy, the Pulsar community introduced a ",[55,1292,1295],{"href":1293,"rel":1294},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fissues\u002F18099",[264],"configurable conflict resolution strategy"," for both topic compaction and table views (only enabled for system topics as of today).",[48,1298,1299],{},"Also, we need to ensure the bundle ownership integrity recovers from disaster cases, such as network failure and broker crashes. Failure of this disaster recovery can cause ownership inconsistency, orphan ownerships, or state changes stuck in in-transit states. To rectify such invalid ownership states, the leader broker listens to any broker unavailability and metadata (ZooKeeper) connection stability and accordingly assigns new brokers. The leader also periodically monitors bundle states and fixes any invalid states that remain too long.",[32,1301,1303],{"id":1302},"transfershedder","TransferShedder",[48,1305,1306,1307,1311],{},"The new load balancer introduced a new shedding strategy, ",[55,1308,1303],{"href":1309,"rel":1310},"https:\u002F\u002Fpulsar.apache.org\u002Fdocs\u002Fnext\u002Fconcepts-broker-load-balancing-concepts\u002F#transfershedder",[264],". Here, we would like to highlight the following characteristics.",[48,1313,1314,1315,1320],{},"One major improvement is that the bundle transfer option makes the unloading process more graceful. Previously, upon unloading, the modular load balancer relied on clients’ lookups to assign new owner brokers via the leader broker. (note that the modular load balancer has ",[55,1316,1319],{"href":1317,"rel":1318},"https:\u002F\u002Fgithub.com\u002Fapache\u002Fpulsar\u002Fpull\u002F20822",[264],"recently improved this behavior",") However, with this bundle transfer option (by default), TransferShedder pre-assigns new owner brokers and helps clients bypass the client-leader-involved assignment.",[48,1322,1323],{},"Another major algorithmic change is with this transfer protocol, TransferShedder unloads bundles from the next highest load brokers to the next lowest load brokers until all of the following are true:",[339,1325,1326,1329,1332],{},[342,1327,1328],{},"The standard deviation of the broker load distribution is below the configured threshold.",[342,1330,1331],{},"There are no significantly underloaded brokers.",[342,1333,1334],{},"There are no significantly overloaded brokers.",[48,1336,1337],{},[351,1338],{"alt":18,"src":1339},"\u002Fimgs\u002Fblogs\u002F650df0b43c80ac729a203250_image.jpeg",[48,1341,1342],{},"Essentially, the goal is to keep the load distribution under the target at minimal steps. For this, TransferShedder tracks the global load score distribution(Standard Deviation) and tries to keep it lower than the configured threshold, loadBalancerBrokerLoadTargetStd, by moving the loads from the highest to the lowest loaded brokers. If there are any outliers (significantly underloaded or overloaded brokers), it will try to prioritize them to unload.",[48,1344,1345],{},"Also, it helps the load balance convergence. Too aggressive load balancing could often result in infinite unloading or bundle oscillation (bouncing bundles). One example is that if one broker is slightly more overloaded than the others, unloading a bundle from that broker might overload the other broker (again slightly more than others). If the target bundle unloading is not as effective, the logic should stop further unloading to avoid this bundle oscillation. The bundle transfer option enables TransferShedder to consider this case, which helps the load balance convergence.",[48,1347,1348],{},"TransferShedder uses the same methodology for broker load score computation as ThresholdShedder, which is based on the exponential moving average of the max of the weighted resource usages among CPU, memory, and network load. It also introduced the loadBalancerSheddingConditionHitCountThreshold config to further control the sensitivity of unloading decisions when the traffic pattern is spiky. Sometimes, traffic might burst and come down soon, and users might want to avoid triggering unloading. In this case, the user could increase this threshold to make the unloading less sensitive to traffic bursts.",[48,1350,1351],{},"Additionally, the extensible load balancer exposes loadBalancerMaxNumberOfBundlesInBundleLoadReport and loadBalancerMaxNumberOfBrokerSheddingPerCycle configs to control the maximum number of bundles and brokers for each unloading cycle. If users need to slow down the load balance impact and limit the impacted bundles and brokers for each unloading cycle (default 1 min), these configs could help them.",[32,1353,1355],{"id":1354},"operational-improvement","Operational Improvement",[48,1357,1358],{},"Recently, Pulsar improved the bundle unload command to specify the destination broker. This will continue to work for the new load balancer, so if manual unloading is needed, admins could try this command as a one-time resolution.",[48,1360,1361,1362,1367],{},"Operationally, we introduced additional metrics from this new load balancer. The community recently ",[55,1363,1366],{"href":1364,"rel":1365},"https:\u002F\u002Fpulsar.apache.org\u002Fdocs\u002F3.1.x\u002Freference-metrics\u002F#loadbalancing-metrics",[264],"updated the metrics page"," to reflect this addition. To summarize, we are trying to show additional breakdown metrics for the decision count grouped by the reason label.",[48,1369,1370],{},"It is also possible to closely monitor the load score for each broker. This will better inform the actual load score used for the load balance decision instead of tracking the root signals, such as memory, CPU, and network load. Additionally, there are other metrics to show what the current load score distribution (avg and std) is.",[339,1372,1373,1376],{},[342,1374,1375],{},"pulsar_lb_resource_usage_stats{feature=max_ema, stat=avg} (gauge) - The average of brokers' load scores.",[342,1377,1378],{},"pulsar_lb_resource_usage_stats{feature=max_ema, stat=std} - The standard deviation of brokers’ load scores.",[48,1380,1381,1382,1387],{},"We added a ",[55,1383,1386],{"href":1384,"rel":1385},"https:\u002F\u002Fgithub.com\u002Fstreamnative\u002Fapache-pulsar-grafana-dashboard\u002Fpull\u002F93",[264],"new sample load balancer dashboard for these metrics here",", so please try it and let us know if you have any questions about how to read them.",[48,1389,1390],{},"Lastly, we added a dynamic config loadBalancerDebugModeEnabled. Often, printing out logs can be the best way to debug issues, and under this debug flag, we tried to put as many decision logs as possible. You can enable this flag without restarting brokers and check the logs for the load balance decisions. This could help to tune the configs. Once the debugging is done, the admin can simply turn off this flag again without restarting brokers.",[40,1392,1394],{"id":1393},"modularcurrent-vs-extensiblenew-load-balancer-performance-tests","Modular(Current) vs. Extensible(New) Load Balancer Performance Tests",[48,1396,1397],{},"We performed four tests to evaluate performance improvement from this new load balancer. These four tests separately ran on the existing load balancer (modular load balancer) and the new one (extensible load balancer). These tests used Puslar-3.0.",[1280,1399,1400],{},[342,1401,1402],{},"Assignment Scalability Test:",[48,1404,1405],{},"Goal: When many clients reconnect, many systems, including Pulsar, suffer from “thundering herd reconnection,” where the brokers are suddenly bombarded by many reconnection(lookup) requests. We expect the shortened lookup path by the new load balancer to help in this scenario.",[48,1407,1408],{},"Methodology: We measure the start and end time to reconnect a large number of publishers(100k) when a large cluster(100 brokers) with many bundles(60k) restarts all brokers in a short time frame(2 mins).",[1280,1410,1411],{"start":19},[342,1412,1413],{},"Assignment Latency Test:",[48,1415,1416],{},"Goal: We are also interested in how the new load balancer improves individual message delays(how quickly an individual message can be re-published) when restarting brokers one by one. Similarly, we expect the shortened lookup path to reduce the latency in this scenario.",[48,1418,1419],{},"Methodology: We compare p99.99 latency of messages(10k partitions, 1000 bundles at 1000 msgs\u002Fs) published when a cluster(10 brokers) restarts brokers one by one.",[1280,1421,1422],{"start":279},[342,1423,1424],{},"Unload Test:",[48,1426,1427],{},"Goal: Automatic Topic(bundle) unloading helps load balancing, especially when scaling brokers up or down because such scaling events suddenly cause load imbalance. We expect the new way of sharing load data, via in-memory non-persistent topics, to propagate load data faster and more lightweight than the Metadata store(ZK). Also, we want to compare a new unloading strategy, TransferShedder, with the current default, ThresholdShedder.",[48,1429,1430],{},"Methodology: We compare time to unload and balance the load(100 bundles, 10k topics\u002F publishers) when a set of brokers joins\u002Fleaves the cluster(5→10, 10→5 broker scaling).",[1280,1432,1434],{"start":1433},4,[342,1435,1436],{},"Split(Hot-spot) Test:",[48,1438,1439],{},"Goal: Automatic bundle splitting is the other important Pulsar load balance feature when  topics are suddenly overloaded, “hot-spot.” This bundle split can isolate such hot-spot topics by splitting the owner bundles into smaller pieces. The child bundles can be more easily unloaded to other brokers to reduce the load on the issuing broker. We want to measure how the new load balancer can improve this process.",[48,1441,1442],{},"Methodology: We compare the time to split one bundle to 128 bundles and balance the load(10k topics\u002F publishers) when the topics have a high load.",[48,1444,1445],{},"‍",[32,1447,1449],{"id":1448},"test-results","Test Results",[48,1451,1452],{},"Assignment Scalability Test Result",[48,1454,1455],{},"100k Publisher Connection Recovery Time",[48,1457,1458],{},"Modular LB",[48,1460,1461],{},"At 12:25, the restart happened",[48,1463,1464],{},[351,1465],{"alt":18,"src":1466},"https:\u002F\u002Fuploads-ssl.webflow.com\u002F639226d67b0d723af8e7ca56\u002F650df1556821f1691a39d5d6_image%20(10).png",[48,1468,1469],{},"Extensible LB",[48,1471,1472],{},"At 09:33, the restart happened",[48,1474,1475],{},[351,1476],{"alt":18,"src":1477},"https:\u002F\u002Fuploads-ssl.webflow.com\u002F639226d67b0d723af8e7ca56\u002F650df1380d6c6e1a2489c0b6_image%20(4).png",[48,1479,1480],{},"Publisher Connection Recovery Time:",[339,1482,1483,1486],{},[342,1484,1485],{},"Modular LB: 20 mins",[342,1487,1488],{},"Extensible LB: 10 mins",[48,1490,1445],{},[48,1492,1493],{},"Assignment Latency Test Result",[48,1495,1496],{},"p99.99 Pub Latency when restarting brokers one by one (total 10 brokers)",[48,1498,1499],{},[351,1500],{"alt":18,"src":1501},"https:\u002F\u002Fuploads-ssl.webflow.com\u002F639226d67b0d723af8e7ca56\u002F650df16c5c0fb7011197b187_image%20(5).png",[48,1503,1504],{},"p99.99 Pub Latency:",[339,1506,1507,1510],{},[342,1508,1509],{},"Modular LB: 1841 ms",[342,1511,1512],{},"Extensible LB: 1228 ms",[48,1514,1445],{},[48,1516,1517],{},"Unload Test Result",[48,1519,1520],{},"‍Modular LB",[48,1522,1523],{},"At 01:38, scaled down from 10 to 5",[48,1525,1526],{},"At 01:58, scaled up from 5 to 10",[48,1528,1529],{},[351,1530],{"alt":18,"src":1531},"https:\u002F\u002Fuploads-ssl.webflow.com\u002F639226d67b0d723af8e7ca56\u002F650df18c9f9e9b8097b4948b_image%20(6).png",[48,1533,1469],{},[48,1535,1536],{},"At 21:58, scaled down from 10 to 5",[48,1538,1539],{},"At 22:08, scaled up from 5 to 10",[48,1541,1542],{},[351,1543],{"alt":18,"src":1544},"https:\u002F\u002Fuploads-ssl.webflow.com\u002F639226d67b0d723af8e7ca56\u002F650df19b0d6c6e1a248a133a_image%20(7).png",[48,1546,1547],{},"Case 1: Time to balance the load from scaling down from 10 brokers to 5 brokers",[48,1549,1550],{},"Time to balance the load:",[339,1552,1553,1556],{},[342,1554,1555],{},"Modular LB: 5 mins",[342,1557,1558],{},"Extensible LB: 3 mins",[48,1560,1561],{},"Case 2: Time to balance the load from scaling up from 5 brokers to 10 brokers",[48,1563,1550],{},[339,1565,1566,1569],{},[342,1567,1568],{},"Modular LB: 7 mins",[342,1570,1571],{},"Extensible LB: 5 mins",[48,1573,1445],{},[48,1575,1576],{},"Split Test Result",[48,1578,1579],{},"Time to balance the load by splitting bundles starting from 1 bundle (up to 128 bundles) and unloading to 10 brokers",[48,1581,1458],{},[48,1583,1584],{},[351,1585],{"alt":18,"src":1586},"https:\u002F\u002Fuploads-ssl.webflow.com\u002F639226d67b0d723af8e7ca56\u002F650df1aa4a302a693609ed53_image%20(8).png",[48,1588,1469],{},[48,1590,1591],{},[351,1592],{"alt":18,"src":1593},"https:\u002F\u002Fuploads-ssl.webflow.com\u002F639226d67b0d723af8e7ca56\u002F650df1b65e7e9bcbec79b1c2_image%20(9).png",[48,1595,1550],{},[339,1597,1598,1601],{},[342,1599,1600],{},"Modular LB: 15 mins",[342,1602,1603],{},"Extensible LB: 13 mins",[48,1605,1606],{},"Also, with loadBalancerBrokerLoadTargetStd=0.1, the new load manager shows a better topic load balance, max - min= 1.1k -779  = 321, than the old load manager’s, 1.6k - 394= 1.2k, which is about 4x better.",[48,1608,1609],{},"Max Topic Count - Min Topic Count:",[339,1611,1612,1615],{},[342,1613,1614],{},"Modular LB: 1.1k -779  = 321",[342,1616,1617],{},"Extensible LB: 1.6k - 394= 1.2k",[48,1619,1620],{},"Please note that the split and unloading cycles occur concurrently. Because of that, unloading could be delayed if the next split occurs faster before unloading. We could further optimize this behavior by splitting the parent bundles in the n-way instead of the current 2-way and immediately triggering unloading post splits. Meanwhile, users could tune loadBalancerSplitIntervalMinutes(default 1min) and loadBalancerSheddingIntervalMinutes(default 1min) if they need to tune those frequencies.",[48,1622,1445],{},[48,1624,1625],{},"Test Result Summary",[48,1627,1628],{},[351,1629],{"alt":18,"src":1630},"\u002Fimgs\u002Fblogs\u002F650dfa4ad3d1705f95192450_Screenshot-2023-09-22-at-10.33.42-PM.png",[48,1632,1633,1634,1637],{},"As we shared earlier",[970,1635,1636],{},"How do we solve the problems in New Load Balancer?",", the new load balancer implemented the following changes, and we are glad to share that these changes can help the above load balance cases up to 2x better.",[48,1639,1640],{},"Distributed load balance decisions",[339,1642,1643],{},[342,1644,1645],{},"Topic lookup and split decisions on every broker instead of going through the leader",[48,1647,1648],{},"Optimized load data sharing",[339,1650,1651,1654],{},[342,1652,1653],{},"The load data is shared in a shorter path. Broker and bundle load data are shared with other brokers via non-persistent(in-memory) Pulsar system topics instead of involving disk persistence in the metadata store(ZK). This makes load balance decisions more up-to-date. Pulsar takes one step closer to a ZK-less architecture.",[342,1655,1656],{},"The amount of shared load data is minimized. Each broker shares only the top K bundles’ load instead of all, which scales better when there are many bundles. Broker and bundle load data are decoupled into different topics because their update cadence differs with different consumption patterns.",[48,1658,1659],{},"Optimized ownership data sharing",[339,1661,1662,1665],{},[342,1663,1664],{},"Ownership data is shared via a Pulsar system topic instead of via metadata store (ZK).",[342,1666,1667],{},"Bundle ownership transfers(pre-assigns) to other brokers upon unloading and broker shutdown.",[48,1669,1670],{},"Improved Shedding algorithm",[339,1672,1673],{},[342,1674,1675],{},"TransferShedder improves the unloading behavior to redistribute the load with minimal steps.",[40,1677,931],{"id":930},[48,1679,1680],{},"Extensible Load Balancer reduced the ZK dependencies in Pulsar by Pulsar native topics and table views. Along with this architectural design change, the test data shows that distributed load balance decisions, optimized load data and ownership data sharing, and new load balance algorithms with the bundle transfer option help to improve the broker load balance performance.",[48,1682,1683,1684,1688],{},"Last year, the pulsar community worked hard to push this load balancer improvement project out to the public, including the ",[55,1685,1687],{"href":1191,"rel":1686},[264],"load balancer docs and migration steps",". We very much appreciate all of the contributors to this project. We are excited to introduce this new load balancer in Pulsar 3.0 with promising performance results.",[48,1690,1691,1692,1697],{},"Furthermore, in addition to this load balancer improvement, there are other innovations in Pulsar 3.0. We strongly recommend checking this ",[55,1693,1696],{"href":1694,"rel":1695},"https:\u002F\u002Fpulsar.apache.org\u002Fblog\u002F2023\u002F05\u002F02\u002Fannouncing-apache-pulsar-3-0\u002F",[264],"Pulsar-3.0 release post",", and we look forward to hearing feedback and contributions from the Pulsar community.",[48,1699,1700],{},"StreamNative proudly holds the position of a major contributor to the development of Apache Pulsar. Our dedication to driving innovation within the Apache Pulsar project remains resolute, and we are steadfast in our commitment to pushing its boundaries even further.",[48,1702,1445],{},{"title":18,"searchDepth":19,"depth":19,"links":1704},[1705,1711,1717,1720],{"id":1108,"depth":19,"text":1109,"children":1706},[1707,1708,1709,1710],{"id":1121,"depth":279,"text":1122},{"id":1140,"depth":279,"text":1141},{"id":1162,"depth":279,"text":1163},{"id":1184,"depth":279,"text":1185},{"id":1197,"depth":19,"text":1198,"children":1712},[1713,1714,1715,1716],{"id":1254,"depth":279,"text":1255},{"id":1266,"depth":279,"text":1267},{"id":1302,"depth":279,"text":1303},{"id":1354,"depth":279,"text":1355},{"id":1393,"depth":19,"text":1394,"children":1718},[1719],{"id":1448,"depth":279,"text":1449},{"id":930,"depth":19,"text":931},"Apache Pulsar","2023-09-22","We are thrilled to introduce our latest addition to the Apache Pulsar version 3.0, Extensible Load Balancer, which improves the existing Pulsar Broker Load Balancer. For those seeking more details, in this blog, we're sharing the specifics of the enhancements and the obstacles we've overcome during the implementation process.",{},"\u002Fblog\u002Fextensible-load-balancer-pulsar-3-0","12 min read",{"title":1088,"description":1723},"blog\u002Fextensible-load-balancer-pulsar-3-0",[1073,1721],"tYJtCICpNx4uOv2GItUHeQ-J-9CWsMePb9psN6NyZRA",[1732,1746],{"id":1733,"title":1090,"bioSummary":1734,"email":290,"extension":8,"image":1735,"linkedinUrl":290,"meta":1736,"position":1743,"stem":1744,"twitterUrl":290,"__hash__":1745},"authors\u002Fauthors\u002Fheesung-sohn.md","Heesung is a platform engineer based in the San Francisco Bay Area. Previously he worked on scaling Aurora Mysql internals for its Serverless feature at AWS.","\u002Fimgs\u002Fauthors\u002Fheesung-sohn.webp",{"body":1737},{"type":15,"value":1738,"toc":1741},[1739],[48,1740,1734],{},{"title":18,"searchDepth":19,"depth":19,"links":1742},[],"Platform Engineer, StreamNative","authors\u002Fheesung-sohn","SYtOzLBhw6LjpMtlQCICNTzXRWwqAf9RBN5B6jlJacA",{"id":1747,"title":1091,"bioSummary":1748,"email":290,"extension":8,"image":1749,"linkedinUrl":290,"meta":1750,"position":290,"stem":1764,"twitterUrl":290,"__hash__":1765},"authors\u002Fauthors\u002Fkai-wang.md","Kai Wang is a Software Engineer at StreamNative.","\u002Fimgs\u002Fauthors\u002Fkai-wang.webp",{"body":1751},{"type":15,"value":1752,"toc":1762},[1753,1755,1760],[48,1754,1748],{},[48,1756,1757],{},[55,1758],{"href":1759},"\u002F",[48,1761,1445],{},{"title":18,"searchDepth":19,"depth":19,"links":1763},[],"authors\u002Fkai-wang","1XDu3QBfjhJjIryQ2g8nQlkH0DAEPFLLLS3qFw1ZwPs",[1767,1775,1781],{"path":1768,"title":1769,"date":1770,"image":1771,"link":1768,"collection":1772,"resourceType":1773,"score":1774,"id":1768},"\u002Fsuccess-stories\u002Funify-achieves-real-time-go-to-market-scale-with-apache-pulsar-and-streamnative-cloud","Unify Achieves Real-Time Go-To-Market Scale with Apache Pulsar and StreamNative Cloud","2025-08-21","\u002Fimgs\u002Fsuccess-stories\u002F68a6f862fb6f4c46942d161f_Unity-case-study.png","successStories","Case Study",1.1,{"path":1776,"title":1777,"date":1778,"image":1779,"link":1780,"collection":1772,"resourceType":1773,"score":1774,"id":1776},"\u002Fsuccess-stories\u002Fhow-q6-cyber-tamed-85-billion-cyberthreat-records-with-apache-pulsar-streamnative-new","How Q6 Cyber Tamed 85+ Billion Cyberthreat Records with Apache Pulsar & StreamNative","2025-04-18","\u002Fimgs\u002Fsuccess-stories\u002F6801bbdbc84a75db5a7ed2c3_SN-SuccessStories-q6cyber.png","https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=lcwu4KDB18c",{"path":1782,"title":1783,"date":1784,"image":1785,"link":1786,"collection":1772,"resourceType":1773,"score":1774,"id":1782},"\u002Fsuccess-stories\u002Fsafari-ai-cuts-cloud-costs-by-50-while-scaling-real-time-computer-vision-analytics-with-streamnative","Safari AI Cuts Cloud Costs by 50% While Scaling Real-Time Computer Vision Analytics with StreamNative","2025-02-04","\u002Fimgs\u002Fsuccess-stories\u002F67be8dbd5b4dd01225b3f174_SN-SuccessStories-safari-ai.webp","https:\u002F\u002Fyoutu.be\u002Fagj1VBc4LyM",1775716422242]