[{"data":1,"prerenderedAt":1884},["ShallowReactive",2],{"active-banner":3,"navbar-featured-partner-blog":24,"navbar-pricing-featured":306,"blog-\u002Fblog\u002Fapache-pulsar-vs-apache-kafka-2022-benchmark":1086,"blog-authors-\u002Fblog\u002Fapache-pulsar-vs-apache-kafka-2022-benchmark":1831,"related-\u002Fblog\u002Fapache-pulsar-vs-apache-kafka-2022-benchmark":1863},{"id":4,"title":5,"date":6,"dismissible":7,"extension":8,"link":9,"link2":10,"linkText":11,"linkText2":12,"meta":13,"stem":21,"variant":22,"__hash__":23},"banners\u002Fbanners\u002Flakestream-ufk-launch.md","StreamNative Introduces Lakestream Architecture and Launches Native Kafka Service","2026-04-07",true,"md","\u002Fblog\u002Ffrom-streams-to-lakestreams","https:\u002F\u002Fconsole.streamnative.cloud\u002Fsignup?from=banner_lakestream-launch","Read Announcement","Sign Up Now",{"body":14},{"type":15,"value":16,"toc":17},"minimark",[],{"title":18,"searchDepth":19,"depth":19,"links":20},"",2,[],"banners\u002Flakestream-ufk-launch","default","zRueBGutATZB0ZnFFHwaEV7F0Di4tnZUHhgOiI4cu6k",{"id":25,"title":26,"authors":27,"body":29,"category":289,"createdAt":290,"date":291,"description":292,"extension":8,"featured":7,"image":293,"isDraft":294,"link":290,"meta":295,"navigation":7,"order":296,"path":297,"readingTime":298,"relatedResources":290,"seo":299,"stem":300,"tags":301,"__hash__":305},"blogs\u002Fblog\u002Fstreamnative-recognized-in-the-forrester-wave-streaming-data-platforms-2025.md","StreamNative Recognized as a Contender in The Forrester Wave™: Streaming Data Platforms, Q4 2025",[28],"David Kjerrumgaard",{"type":15,"value":30,"toc":276},[31,39,47,51,67,73,78,81,87,102,109,115,118,124,127,134,140,143,146,157,163,169,172,175,178,184,191,194,197,204,207,210,224,229,233,237,241,245,249,251,268,270],[32,33,35],"h3",{"id":34},"receives-highest-possible-scores-in-both-the-messaging-and-resource-optimization-criteria",[36,37,38],"em",{},"Receives Highest Possible Scores in BOTH the Messaging and Resource Optimization Criteria",[40,41,43],"h2",{"id":42},"introduction",[44,45,46],"strong",{},"Introduction",[48,49,50],"p",{},"Real-time data has become the backbone of modern innovation. As artificial intelligence (AI) and digital services demand instantaneous insights, organizations are realizing that streaming data is no longer optional – it's essential for delivering timely, context-rich experiences. StreamNative's data streaming platform is built precisely for this reality, ensuring data is immediate, reliable, and ready to power critical applications.",[48,52,53,54,63,64],{},"Today, we're excited to announce that Forrester Research has named StreamNative as a Contender in its evaluation, ",[55,56,58],"a",{"href":57},"\u002Freports\u002Frecognized-in-the-forrester-wave-tm-streaming-data-platforms-q4-2025",[36,59,60],{},[44,61,62],{},"The Forrester Wave™: Streaming Data Platforms, Q4 2025",". This report evaluated 15 top streaming data platform providers, and we're proud to share that ",[44,65,66],{},"StreamNative received the highest scores possible—5 out of 5—in both the Messaging and Resource Optimization criteria.",[48,68,69,70],{},"***Forrester's Take: ***",[36,71,72],{},"\"StreamNative is a good fit for enterprises that want an Apache Pulsar implementation that is also compatible with Kafka APIs.\"",[48,74,75],{},[36,76,77],{},"— The Forrester Wave™: Streaming Data Platforms, Q4 2025",[48,79,80],{},"Being recognized in the Forrester Wave is a proud milestone, and for us, it highlights how far StreamNative has come in enabling enterprises to unlock the power of real-time data. In the sections below, we'll dive into what we believe sets StreamNative apart—from our modern architecture and cloud-native design to our open-source foundation and real-time use cases—and how we see these strengths aligning with Forrester's findings.",[40,82,84],{"id":83},"trusted-by-industry-leaders",[44,85,86],{},"Trusted by Industry Leaders",[48,88,89,90,93,94,97,98,101],{},"Companies across industries are already leveraging StreamNative to drive real-time outcomes. Global enterprises like ",[44,91,92],{},"Cisco"," rely on StreamNative to handle massive IoT telemetry, supporting 245 million+ connected devices. Martech leaders such as ",[44,95,96],{},"Iterable"," process billions of events per day with StreamNative for hyper-personalized customer engagement. And in financial services, ",[44,99,100],{},"FICO"," trusts StreamNative to power its real-time fraud detection and analytics pipelines with a secure, scalable streaming backbone.",[48,103,104,105,108],{},"The Forrester report notes that, “",[36,106,107],{},"Customers appreciate the lower infrastructure costs that result from StreamNative’s cost-efficient, Kafka-compatible architecture. Customers note excellent support responsiveness…","”",[40,110,112],{"id":111},"modern-cloud-native-architecture-built-for-scale",[44,113,114],{},"Modern, Cloud-Native Architecture Built for Scale",[48,116,117],{},"From day one, StreamNative was designed with a modern architecture to meet the demanding scale and flexibility requirements of real-time data. Unlike legacy streaming systems that often rely on tightly coupled storage and compute, StreamNative's platform takes a cloud-native approach: it decouples these layers to enable elastic scalability and efficient resource utilization across any environment. The core is powered by Apache Pulsar—a distributed messaging and streaming engine—enhanced with multi-protocol support (including native Apache Kafka API compatibility) to unify diverse data streams under one roof. This means organizations can consolidate siloed messaging systems and handle both high-volume event streams and traditional message queues on a single platform, without sacrificing performance or reliability.",[48,119,120,121,108],{},"Forrester's evaluation described that “",[36,122,123],{},"StreamNative aims to provide a high-performance, multi-protocol streaming data platform: It uses Apache Pulsar with Kafka API compatibility to deliver cost-efficient, real-time applications for enterprises. It appeals to organizations that want a flexible, low-cost streaming solution, due to its focus on scalability and resource optimization, while its investments in Pulsar’s open-source ecosystem and performance optimization make it the primary platform for enterprises wishing to implement Pulsar.",[48,125,126],{},"Our cloud-first, leaderless architecture (with no single broker bottlenecks) and tiered storage model were built to maximize throughput and cost-efficiency for real-time workloads. By separating compute from storage and leveraging distributed object storage, StreamNative can retain huge volumes of event data indefinitely while keeping compute costs in check—effectively providing a flexible, low-cost streaming solution.",[48,128,129,130,133],{},"This modern design not only delivers high performance, but also ensures fault tolerance and geo-distribution out of the box, so enterprises can trust their streaming data is always available and durable. As Forrester’s evaluation noted, StreamNative ",[36,131,132],{},"\"excels at messaging and resource optimization\" and “Its platform supports use cases like real-time analytics and event-driven architectures with robust scalability.","” Our architecture provides the strong foundation that today's real-time applications demand, from ultra-fast data ingestion to seamless scale-out across hybrid and multi-cloud environments.",[40,135,137],{"id":136},"open-source-foundation-and-pulsar-expertise",[44,138,139],{},"Open Source Foundation and Pulsar Expertise",[48,141,142],{},"StreamNative's DNA is rooted in open source innovation. Our founders are the original creators of Apache Pulsar, and we've built our platform with the same open principles: freedom, flexibility, and community-driven innovation. For developers and data teams, this means adopting StreamNative comes with no proprietary lock-in—instead, you get a platform built on open standards and a thriving ecosystem. We offer broad API compatibility (Pulsar, Kafka, JMS, MQTT, and more) so that teams can work with familiar interfaces and integrate StreamNative into existing systems with ease.",[48,144,145],{},"StreamNative is the primary commercial contributor to the Apache Pulsar project and its surrounding ecosystem. We invest heavily in Pulsar's ongoing improvements our investments in Pulsar's open-source ecosystem and performance optimization bolster StreamNative's value. We also foster a vibrant community through initiatives like the Data Streaming Summit and free training resources.",[48,147,148,149,152,153,156],{},"Forrester's assessment noted that StreamNative’s “",[36,150,151],{},"events-driven agents, extensibility, and performance architecture are solid,","” and we're continuing to build on that foundation. ",[44,154,155],{},"We're actively investing in expanding our tooling for observability, governance, schema management, and developer productivity","—areas we recognize as critical for enterprise adoption and where we're committed to accelerating our roadmap.",[48,158,159,160],{},"Being open also means embracing an open ecosystem of technologies. StreamNative actively integrates with the tools and platforms that matter most to our users. We partner with industry leaders like Snowflake, Databricks, Google, and Ververica to ensure our streaming platform works seamlessly with data warehouses, lakehouse storage, and stream processing frameworks. Forrester’s evaluation observed that StreamNative’s ",[36,161,162],{},"\"investments in Pulsar’s open-source ecosystem and performance optimization make it the primary platform for enterprises wishing to implement Pulsar.\"",[40,164,166],{"id":165},"powering-real-time-use-cases-across-industries",[44,167,168],{},"Powering Real-Time Use Cases Across Industries",[48,170,171],{},"One of the greatest validations of StreamNative's approach is the success our customers are achieving with real-time data. StreamNative's platform is versatile and use-case agnostic—if an application demands high-volume, low-latency data movement, we can power it. This flexibility is why our customer base spans industries from finance and IoT to major automobile manufacturers and online gaming. The common thread is that these organizations need to process and react to data in milliseconds, and StreamNative is delivering the capabilities to make that possible.",[48,173,174],{},"Cisco uses StreamNative to underpin an IoT telemetry system of colossal scale, connecting hundreds of millions of devices and thousands of enterprise clients with real-time data streams. The platform's multi-tenant design and proven reliability allow Cisco to offer its customers a live feed of device data with unwavering confidence. In the financial sector, FICO has built streaming pipelines on StreamNative to detect fraud as transactions happen and to monitor systems in real time. With StreamNative's strong guarantees around message durability and ordering, FICO can catch anomalies or suspicious patterns within seconds. And in digital customer engagement, Iterable relies on StreamNative to process billions of events every day—clicks, views, purchases—so that marketers can trigger personalized campaigns instantly based on user behavior.",[48,176,177],{},"Our customers uniformly deal with mission-critical data streams, where downtime or delays are unacceptable. StreamNative's fault-tolerant, scalable infrastructure has proven equal to the task, handling scenarios like bursting to millions of events per second or seamlessly spanning multiple cloud regions. Forrester's report recognized StreamNative for supporting event-driven architectures with robust scalability—which for us is a reflection of our platform's ability to meet the most demanding enterprise requirements.",[40,179,181],{"id":180},"continuing-to-innovate-ursa-orca-and-the-road-ahead",[44,182,183],{},"Continuing to Innovate: Ursa, Orca, and the Road Ahead",[48,185,186,187,190],{},"While we are thrilled to be recognized in Forrester's Streaming Data Platforms Wave, we view this as just the beginning. StreamNative's vision has always been bold: to ",[44,188,189],{},"provide a unified platform that not only handles today's streaming needs but also anticipates the emerging requirements of tomorrow",".",[48,192,193],{},"One key area of focus is the convergence of streaming data with advanced analytics and AI. As Forrester points out in the report, technology leaders should look for platforms that natively integrate messaging, stream processing, and analytics to provide AI agents with real-time, contextualized information. We couldn't agree more. Our award-winning Ursa Engine and Orca Agent Engine are aimed at extending our platform up the stack—bridging the gap between data streams and data lakes, and between event streams and intelligent processing.",[48,195,196],{},"Our new Ursa Engine introduces a lakehouse-native approach to streaming: it can write events directly to table formats like Iceberg on cloud storage, eliminating entire classes of ETL jobs and making fresh data instantly available for analytics queries. By integrating streaming and lakehouse technologies, we help customers collapse data silos and accelerate their AI\u002FML pipelines.",[48,198,199,200,203],{},"Beyond analytics integration, we are also enhancing StreamNative with more out-of-the-box processing and governance capabilities. In the coming months, we plan to introduce new features for lightweight stream processing and transformation, making it easier to build reactive applications directly on the platform. We're also expanding our ecosystem of connectors and integrations, so that whether your data lands in Snowflake, Databricks, or an AI model, StreamNative will seamlessly feed it. ",[44,201,202],{},"We're investing significantly in enterprise features including security, schema registry, governance, and monitoring tooling","—capabilities that are essential for mission-critical deployments and where we're committed to continued improvement.",[48,205,206],{},"This recognition from Forrester energizes us to keep innovating at full speed. We're sharing this honor with our amazing customers, community, and partners who drive us forward every day. Your feedback and real-world challenges have helped shape StreamNative into what it is today, and together, we will shape the future of streaming data. Thank you for joining us on this journey—we're just getting started, and we can't wait to deliver even more value as we continue to evolve our platform. Onward to real-time everything!",[208,209],"hr",{},[32,211,213],{"id":212},"streamnative-in-the-forrester-wave-evaluation-findings",[44,214,215,216,223],{},"StreamNative in ",[44,217,218],{},[55,219,220],{"href":57},[44,221,222],{},"The Forrester Wave™",": Evaluation Findings",[225,226,228],"h5",{"id":227},"recognized-as-a-contender-among-15-streaming-data-platform-providers","• Recognized as a Contender among 15 streaming data platform providers",[225,230,232],{"id":231},"received-the-highest-scores-possible-50-in-both-the-messaging-and-resource-optimization-criteria","* Received the highest scores possible (5.0) in both the Messaging and Resource Optimization criteria",[225,234,236],{"id":235},"cited-as-the-primary-platform-for-enterprises-wishing-to-implement-pulsar","• Cited as the primary platform for enterprises wishing to implement Pulsar",[225,238,240],{"id":239},"noted-for-excelling-at-messaging-and-resource-optimization","• Noted for excelling at messaging and resource optimization",[225,242,244],{"id":243},"customers-cited-lower-infrastructure-costs-and-excellent-support-responsiveness","• Customers cited lower infrastructure costs and excellent support responsiveness",[225,246,248],{"id":247},"recognized-for-supporting-event-driven-architectures-with-robust-scalability","• Recognized for supporting event-driven architectures with robust scalability",[208,250],{},[252,253,255,256,259,260,190],"h6",{"id":254},"forrester-disclaimer-forrester-does-not-endorse-any-company-product-brand-or-service-included-in-its-research-publications-and-does-not-advise-any-person-to-select-the-products-or-services-of-any-company-or-brand-based-on-the-ratings-included-in-such-publications-information-is-based-on-the-best-available-resources-opinions-reflect-judgment-at-the-time-and-are-subject-to-change-for-more-information-read-about-forresters-objectivity-here","**Forrester Disclaimer: **",[36,257,258],{},"Forrester does not endorse any company, product, brand, or service included in its research publications and does not advise any person to select the products or services of any company or brand based on the ratings included in such publications. Information is based on the best available resources. Opinions reflect judgment at the time and are subject to change",". *For more information, read about Forrester’s objectivity *",[55,261,265],{"href":262,"rel":263},"https:\u002F\u002Fwww.forrester.com\u002Fabout-us\u002Fobjectivity\u002F",[264],"nofollow",[36,266,267],{},"here",[208,269],{},[252,271,273],{"id":272},"apache-apache-pulsar-apache-kafka-apache-flink-and-other-names-are-trademarks-of-the-apache-software-foundation-no-endorsement-by-apache-or-other-third-parties-is-implied",[36,274,275],{},"Apache®, Apache Pulsar®, Apache Kafka®, Apache Flink® and other names are trademarks of The Apache Software Foundation. No endorsement by Apache or other third parties is implied.",{"title":18,"searchDepth":19,"depth":19,"links":277},[278,280,281,282,283,284,285],{"id":34,"depth":279,"text":38},3,{"id":42,"depth":19,"text":46},{"id":83,"depth":19,"text":86},{"id":111,"depth":19,"text":114},{"id":136,"depth":19,"text":139},{"id":165,"depth":19,"text":168},{"id":180,"depth":19,"text":183,"children":286},[287],{"id":212,"depth":279,"text":288},"StreamNative in The Forrester Wave™: Evaluation Findings","Company",null,"2025-12-16","StreamNative is recognized in The Forrester Wave™: Streaming Data Platforms, Q4 2025. Discover why Forrester highlights StreamNative's high-performance messaging, efficient resource use, and cost-effective Kafka API compatibility for real-time innovation.","\u002Fimgs\u002Fblogs\u002F693bd36cf01b217dcb67278f_Streamnative_blog_thumbnail.png",false,{},0,"\u002Fblog\u002Fstreamnative-recognized-in-the-forrester-wave-streaming-data-platforms-2025","10 mins read",{"title":26,"description":292},"blog\u002Fstreamnative-recognized-in-the-forrester-wave-streaming-data-platforms-2025",[302,303,304],"Announcements","Real-Time","Forrester","sOeeJtEO3O-IIfTPJjY1AFOMawZ_rf8FOH8A98NEKgU",{"id":307,"title":308,"authors":309,"body":314,"category":1073,"createdAt":290,"date":1074,"description":1075,"extension":8,"featured":7,"image":1076,"isDraft":294,"link":290,"meta":1077,"navigation":7,"order":296,"path":1078,"readingTime":1079,"relatedResources":290,"seo":1080,"stem":1081,"tags":1082,"__hash__":1085},"blogs\u002Fblog\u002Fhow-we-run-a-5-gb-s-kafka-workload-for-just-50-per-hour.md","How We Run a 5 GB\u002Fs Kafka Workload for Just $50 per Hour",[310,311,312,313],"Matteo Meril","Neng Lu","Hang Chen","Penghui Li",{"type":15,"value":315,"toc":1043},[316,319,322,325,328,331,335,338,348,354,357,365,370,374,381,384,387,395,399,402,407,411,414,417,420,423,432,436,439,450,453,457,460,463,474,477,481,485,493,496,500,508,537,541,544,549,553,556,560,563,566,571,580,585,588,591,602,606,609,620,624,627,630,635,638,667,671,673,679,682,687,692,695,699,713,717,728,732,747,756,767,770,773,777,780,783,794,797,800,803,808,813,817,821,838,842,856,861,865,876,879,895,899,910,915,920,928,932,935,939,946,950,953,962,967,976,982,991,1000,1009,1018,1027,1035],[48,317,318],{},"The rise of DeepSeek has shaken the AI infrastructure market, forcing companies to confront the escalating costs of training and deploying AI models. But the real pressure point isn’t just compute—it’s data acquisition and ingestion costs.",[48,320,321],{},"As businesses rethink their AI cost-containment strategies, real-time data streaming is emerging as a critical enabler. The growing adoption of Kafka as a standard protocol has expanded cost-efficient options, allowing companies to optimize streaming analytics while keeping expenses in check.",[48,323,324],{},"Ursa, the data streaming engine powering StreamNative’s managed Kafka service, is built for this new reality. With its leaderless architecture and native lakehouse storage integration, Ursa eliminates costly inter-zone network traffic for data replication and client-to-broker communication while ensuring high availability at minimal operational cost.",[48,326,327],{},"In this blog post, we benchmarked the infrastructure cost and total cost of ownership (TCO) for running a 5GB\u002Fs Kafka workload across different Kafka vendors, including Redpanda, Confluent WarpStream, and AWS MSK. Our benchmark results show that Ursa can sustain 5GB\u002Fs Kafka workloads at just 5% of the cost of traditional streaming engines like Redpanda—making it the ideal solution for high-performance, cost-efficient ingestion and data streaming for data lakehouses and AI workloads.",[48,329,330],{},"Note: We also evaluated vanilla Kafka in our benchmark; however, for simplicity, we have focused our cost comparison on vendor solutions rather than self-managed deployments. That said, it is important to highlight that both Redpanda and vanilla Kafka use a leader-based data replication approach. In a data-intensive, network-bound workload like 5GB\u002Fs streaming, with the same machine type and replication factor, Redpanda and vanilla Kafka produced nearly identical cost profiles.",[40,332,334],{"id":333},"key-benchmark-findings","Key Benchmark Findings",[48,336,337],{},"Ursa delivered 5 GB\u002Fs of sustained throughput at an infrastructure cost of just $54 per hour. For comparison:",[339,340,341,345],"ul",{},[342,343,344],"li",{},"MSK: $303 per hour → 5.6x more expensive compared to Ursa",[342,346,347],{},"Redpanda: $988 per hour → 18x more expensive compared to Ursa",[48,349,350],{},[351,352],"img",{"alt":18,"src":353},"\u002Fimgs\u002Fblogs\u002F679c71b67d9046f26edc7977_AD_4nXfvTqyBNUBu2lObdkKAx-5UNkpNP8UYULLZyOcixE6z99VMZUUEsUqWjzexI7vjyNGRNSAUoM9smYvdTP55ctAhIbrs5lmQgcSVMWdaoigbWouCl95DVSQsxooY-qqfGcYqS4g4zA.png",[48,355,356],{},"Beyond infrastructure costs, when factoring in both storage pricing, vendor pricing and operational expenses, Ursa’s total cost of ownership (TCO) for a 5GB\u002Fs workload with a 7-day retention period is:",[339,358,359,362],{},[342,360,361],{},"50% cheaper than Confluent WarpStream",[342,363,364],{},"85% cheaper than MSK and Redpanda",[48,366,367],{},[351,368],{"alt":18,"src":369},"\u002Fimgs\u002Fblogs\u002F679c602d77e9c706de5343b8_AD_4nXeDv8rrv_C1CTCCiqYo1zpvlGYbdBk1r0VEqovAPu22iFMQZgh54Hfw9PBMLzM7jDFxKwAFDxbdG0np4XVk_tGsWhEKMloLRcmmea7lvueCx-0cFsyaE3Mya4Mxc1Dox95A6JEc.png",[40,371,373],{"id":372},"ursa-highly-cost-efficient-data-streaming-at-scale","Ursa: Highly Cost-Efficient Data Streaming at Scale",[48,375,376,380],{},[55,377,379],{"href":378},"\u002Fblog\u002Fursa-reimagine-apache-kafka-for-the-cost-conscious-data-streaming","Ursa"," is a next-generation data streaming engine designed to deliver high performance at a fraction of the cost of traditional disk-based solutions. It is fully compatible with Apache Kafka and Apache Pulsar APIs, while leveraging a leaderless, lakehouse-native architecture to maximize scalability, efficiency, and cost savings.",[48,382,383],{},"Ursa’s key innovation is separating storage from compute and decoupling metadata\u002Findex operations from data operations by utilizing cloud object storage (e.g., AWS S3) instead of costly inter-zone disk-based replication. It also employs open lakehouse formats (Iceberg and Delta Lake), enabling columnar compression to significantly reduce storage costs while maintaining durability and availability.",[48,385,386],{},"In contrast, traditional streaming systems—like Kafka and Redpanda—depend on leader-based architectures, which drive up inter-zone traffic costs due to replication and client communication. Ursa mitigates these costs by:",[339,388,389,392],{},[342,390,391],{},"Eliminating inter-zone traffic costs via a leaderless architecture.",[342,393,394],{},"Replacing costly inter-zone replication with direct writes to cloud storage using open lakehouse formats.",[40,396,398],{"id":397},"how-ursa-eliminates-inter-zone-traffic","How Ursa Eliminates Inter-Zone Traffic",[48,400,401],{},"Ursa minimizes inter-zone traffic by leveraging a leaderless architecture, which eliminates inter-zone communication between clients and brokers, and lakehouse-native storage, which removes the need for inter-zone data replication. This approach ensures high availability and scalability while avoiding unnecessary cross-zone data movement.",[48,403,404],{},[351,405],{"alt":18,"src":406},"\u002Fimgs\u002Fblogs\u002F679c602e21b3571bb7117dca_AD_4nXd7Oahc77NjRLNvA9clLt0tsyU6MrIqVibFYv5pW5giTIcCHPr3EA_yTGzfVEUIVO3VXK56qWK8zmBCp5lY0E_4nmlWIPFrHjtHylA5NhwELjn-UB0fLG2h_kbrxrc7Cs_edvveNA.png",[32,408,410],{"id":409},"leaderless-architecture","Leaderless architecture",[48,412,413],{},"Traditional streaming engines such as Kafka, Pulsar, or RedPanda rely on a leader-based model, where each partition is assigned to a single leader broker that handles all writes and reads.",[48,415,416],{},"Pros of Leader-Based Architectures:\n✔ Maintains message ordering via local sequence IDs\n✔ Delivers low latency and high performance through message caching",[48,418,419],{},"Cons of Leader-Based Architectures:\n✖ Throughput bottlenecked by a single broker per partition\n✖ Inter-zone traffic required for high availability in multi-AZ deployments",[48,421,422],{},"While Kafka and Pulsar offer partial solutions (e.g., reading from followers, shadow topics) to reduce read-related inter-zone traffic, producers still send data to a single leader.",[48,424,425,426,431],{},"Ursa removes the concept of topic ownership, allowing any broker in the cluster to handle reads or writes for any partition. The primary challenge—ensuring message ordering—is solved with ",[55,427,430],{"href":428,"rel":429},"https:\u002F\u002Fgithub.com\u002Fstreamnative\u002Foxia",[264],"Oxia",", a scalable metadata and index service created by StreamNative in 2022.",[32,433,435],{"id":434},"oxia-the-metadata-layer-enabling-leaderless-architecture","Oxia: The Metadata Layer Enabling Leaderless Architecture",[48,437,438],{},"Ensuring message ordering in a leaderless architecture is complex, but Ursa solves this with Oxia:",[339,440,441,444,447],{},[342,442,443],{},"Handles millions of metadata\u002Findex operations per second",[342,445,446],{},"Generates sequential IDs to maintain strict message ordering",[342,448,449],{},"Optimized for Kubernetes with horizontal scalability",[48,451,452],{},"Producers and consumers can connect to any broker within their local AZ, eliminating inter-zone traffic costs while maintaining performance through localized caching.",[32,454,456],{"id":455},"zero-interzone-data-replication","Zero interzone data replication",[48,458,459],{},"In most distributed systems, data replication from a leader (primary) to followers (replicas) is crucial for fault tolerance and availability. However, replication across zones can inflate infrastructure expenses substantially.",[48,461,462],{},"Ursa avoids these costs by writing data directly to cloud storage (e.g., AWS S3, Google GCS):",[339,464,465,468,471],{},[342,466,467],{},"Built-In Resilience: Cloud storage inherently offers high availability and fault tolerance without inter-zone traffic fees.",[342,469,470],{},"Tradeoff: Slightly higher latency (sub-second, with p99 at 500 milliseconds) compared to local disk\u002FEBS (single-digit to sub-100 milliseconds), in exchange for significantly lower costs (up to 10x lower).",[342,472,473],{},"Flexible Modes: Ursa is an addition to the classic BookKeeper-based engine, providing users with the flexibility to optimize for either cost or low latency based on their workload requirements.",[48,475,476],{},"By foregoing conventional replication, Ursa slashes inter-zone traffic costs and associated complexities—making it a compelling option for organizations seeking to balance high-performance data streaming with strict budget constraints.",[40,478,480],{"id":479},"how-we-ran-a-5-gbs-test-with-ursa","How We Ran a 5 GB\u002Fs Test with Ursa",[32,482,484],{"id":483},"ursa-cluster-deployment","Ursa Cluster Deployment",[339,486,487,490],{},[342,488,489],{},"9 brokers across 3 availability zones, each on m6i.8xlarge (Fixed 12.5 Gbps bandwidth, 32 vCPU cores, 128 GB memory).",[342,491,492],{},"Oxia cluster (metadata store) with 3 nodes of m6i.8xlarge, distributed across three availability zones (AZs).",[48,494,495],{},"During peak throughput (5 GB\u002Fs), each broker’s network usage was about 10 Gbps.",[32,497,499],{"id":498},"openmessaging-benchmark-workers-configuration","OpenMessaging Benchmark Workers & Configuration",[48,501,502,503,507],{},"The OpenMessaging Benchmark(OMB) Framework is a suite of tools that make it easy to benchmark distributed messaging systems in the cloud. Please check ",[55,504,505],{"href":505,"rel":506},"https:\u002F\u002Fopenmessaging.cloud\u002Fdocs\u002Fbenchmarks\u002F",[264]," for details.",[339,509,510,525,534],{},[342,511,512,513,518,519,524],{},"12 OMB workers: 6 for ",[55,514,517],{"href":515,"rel":516},"https:\u002F\u002Fgist.github.com\u002Fcodelipenghui\u002Fd1094122270775e4f1580947f80c5055",[264],"producers",", 6 for ",[55,520,523],{"href":521,"rel":522},"https:\u002F\u002Fgist.github.com\u002Fcodelipenghui\u002F06bada89381fb77a7862e1b4c1d8963d",[264],"consumers"," across 3 availability zones, on m6i.8xlarge instances. Each worker is configured with 12 CPU cores and 48 GB memory.",[342,526,527,528,533],{},"Sample YAML ",[55,529,532],{"href":530,"rel":531},"https:\u002F\u002Fgist.github.com\u002Fcodelipenghui\u002F204c1f26c4d44a218ae235bf2de99904",[264],"scripts"," provided for Kafka-compatible configuration and rate limits.",[342,535,536],{},"Achieved consistent 5 GB\u002Fs publish\u002Fsubscribe throughput.",[40,538,540],{"id":539},"ursa-benchmark-tests-results","Ursa Benchmark Tests & Results",[48,542,543],{},"The following diagram demonstrates that Ursa can consistently handle 5 GB\u002Fs of traffic, fully saturating the network across all broker nodes.",[48,545,546],{},[351,547],{"alt":18,"src":548},"\u002Fimgs\u002Fblogs\u002F679c602d7b261bac1113f7d6_AD_4nXdDPsRc3koXICiFF0bqSmGWbJt_RlUy4FE3ruuWOfbCfpcqZ1dejjqGbkaCJv2hQFL1nirRouBVRW2l5uMWBvY9naMqGB_wHcLI14dBM0f85TXhmdm3UxEv1yGX9Y4hf5FttSkZew.png",[40,550,552],{"id":551},"comparing-infrastructure-cost","Comparing Infrastructure Cost",[48,554,555],{},"This benchmark first evaluates infrastructure costs of running a 5 GB\u002Fs streaming workload (1:1 producer-to-consumer ratio) across different data streaming engines, including Ursa, Redpanda, and AWS MSK, with a focus on multi-AZ deployments to ensure a fair comparison.",[32,557,559],{"id":558},"test-setup-key-assumptions","Test Setup & Key Assumptions",[48,561,562],{},"All tests use multi-AZ configurations, with clusters and clients distributed across three AWS availability zones (AZs). Cluster size scales proportionally to the number of AZs, and rack-awareness is enabled for all engines to evenly distribute topic partitions and leaders.",[48,564,565],{},"To ensure a fair comparison, we selected the same machine type capable of fully utilizing both network and storage bandwidth for Ursa and Redpanda in this 5GB\u002Fs test:",[339,567,568],{},[342,569,570],{},"9 × m6i.8xlarge instances",[48,572,573,574,579],{},"However, MSK's storage bandwidth limits vary depending on the selected instance type, with the highest allowed limit capped at 1000 MiB\u002Fs per broker, according to",[55,575,578],{"href":576,"rel":577},"https:\u002F\u002Fdocs.aws.amazon.com\u002Fmsk\u002Flatest\u002Fdeveloperguide\u002Fmsk-provision-throughput-management.html#throughput-bottlenecks",[264]," AWS documentation",". Given this constraint, achieving 5 GB\u002Fs throughput with a replication factor of 3 required the following setup:",[339,581,582],{},[342,583,584],{},"15 × kafka.m7g.8xlarge (32 vCPUs, 128 GB memory, 15 Gbps network, 4000 GiB EBS).",[48,586,587],{},"This configuration was necessary to work around MSK's storage bandwidth limitations, ensuring a comparable cost basis to other evaluated streaming engines.",[48,589,590],{},"Additional key assumptions include:",[339,592,593,596,599],{},[342,594,595],{},"Inter-AZ producer traffic: For leader-based engines, two-thirds of producer-to-broker traffic crosses AZs due to leader distribution.",[342,597,598],{},"Consumer optimizations: Follower fetch is enabled across all tests, eliminating inter-AZ consumer traffic.",[342,600,601],{},"Storage cost exclusions: This benchmark only evaluates streaming costs, assuming no long-term data retention.",[32,603,605],{"id":604},"inter-broker-replication-costs","Inter-Broker Replication Costs",[48,607,608],{},"Inter-broker (cross-AZ) replication is a major cost driver for data streaming engines:",[339,610,611,614,617],{},[342,612,613],{},"RedPanda: Inter-broker replication is not free, leading to substantial costs when data must be copied across multiple availability zones.",[342,615,616],{},"AWS MSK: Inter-broker replication is free, but MSK instance pricing is significantly higher (e.g., $3.264 per hour for kafka.m7g.8xlarge vs $1.306 per hour for an on-demand m7g.8xlarge). The storage price of MSK is $0.10 per GB-month which is significantly higher than st1, which costs $0.045 per GB-month. Even though replication is free, client-to-broker traffic still incurs inter-AZ charges.",[342,618,619],{},"Ursa: No inter-broker replication costs due to its leaderless architecture, eliminating inter-zone replication costs entirely.",[32,621,623],{"id":622},"zone-affinity-reducing-inter-az-costs","Zone Affinity: Reducing Inter-AZ Costs",[48,625,626],{},"We evaluated zone affinity mechanisms to further reduce inter-AZ data transfer costs.",[48,628,629],{},"Consumers:",[339,631,632],{},[342,633,634],{},"Follower fetch is enabled across all tests, ensuring consumers fetch data from replicas in their local AZ—eliminating inter-zone consumer traffic except for metadata lookups",[48,636,637],{},"Producers:",[339,639,640,649,658],{},[342,641,642,643,648],{},"Kafka protocol lacks an easy way to enforce producer AZ affinity (though ",[55,644,647],{"href":645,"rel":646},"https:\u002F\u002Fcwiki.apache.org\u002Fconfluence\u002Fdisplay\u002FKAFKA\u002FKIP-1123:+Rack-aware+partitioning+for+Kafka+Producer",[264],"KIP-1123"," aims to address this). And it only works with the default partitioner (i.e., when no record partition or record key is specified).",[342,650,651,652,657],{},"Redpanda recently introduced ",[55,653,656],{"href":654,"rel":655},"https:\u002F\u002Fdocs.redpanda.com\u002Fredpanda-cloud\u002Fdevelop\u002Fproduce-data\u002Fleader-pinning\u002F",[264],"leader pinning",", but this only benefits setups where producers are confined to a single AZ—not applicable to our multi-AZ benchmark.",[342,659,660,661,666],{},"Ursa is the only system in this test with ",[55,662,665],{"href":663,"rel":664},"https:\u002F\u002Fdocs.streamnative.io\u002Fdocs\u002Fconfig-kafka-client#eliminate-cross-az-networking-traffic",[264],"built-in zone affinity for both producers and consumers",". It achieves this by embedding producer AZ information in client.id, allowing metadata lookups to route clients to local-AZ brokers, eliminating inter-AZ producer traffic.",[32,668,670],{"id":669},"cost-comparison-results","Cost Comparison Results",[48,672,337],{},[339,674,675,677],{},[342,676,344],{},[342,678,347],{},[48,680,681],{},"Ursa’s leaderless architecture, zone affinity, and native cloud storage integration deliver unparalleled cost efficiency, making it the most cost-effective choice for high-throughput data streaming workloads.",[48,683,684],{},[351,685],{"alt":18,"src":686},"\u002Fimgs\u002Fblogs\u002F679c72208198ca36a352f228_AD_4nXeeZuM8T-xBlD4Vf3j67K618n08qh8wIDLLtiLJG0ssA1Wj1V26u7wIDTX9sqLrtw8mB2c299dwzarGen62CG0Vh7nWstn5qbPGFcBaKJYEepTsLr5fHWv1U8uqbg8Y0UOK6fJ7.png",[48,688,689],{},[351,690],{"alt":18,"src":691},"\u002Fimgs\u002Fblogs\u002F679c625978031f40229de484_AD_4nXdLkLLJ30KKr-_A_rN1j8akVwBYacAWIPzWHoOReJF421890kfByZoQQxkLczihVSmiw5Q9J51-V9I2SEKITbwsYnANDDTlAVL5nQ_jfaHNTe9VEWhSoa7DZooCnilDYL6l6msmJg.png",[48,693,694],{},"The detailed infrastructure cost calculations for each data streaming engine are listed below:",[32,696,698],{"id":697},"streamnative-ursa","StreamNative - Ursa",[339,700,701,704,707,710],{},[342,702,703],{},"Server EC2 costs: 9 * $1.536\u002Fhr = $14",[342,705,706],{},"Client EC2 costs: 9 * $1.536\u002Fhr =$14",[342,708,709],{},"S3 write requests costs: 1350 r\u002Fs * $0.005\u002F1000r * 3600s = $24",[342,711,712],{},"S3 read requests costs: 1350 r\u002Fs * $0.0004\u002F1000r * 3600s = $2",[32,714,716],{"id":715},"aws-msk","AWS MSK",[339,718,719,722,725],{},[342,720,721],{},"Server EC2 costs: 15 * $3.264\u002Fhr = $49",[342,723,724],{},"Client side EC2 costs: 9 * $1.536\u002Fhr =$14",[342,726,727],{},"Interzone traffic - producer to broker: 5GB\u002Fs * ⅔ * $0.02\u002FG(in+out) * 3600 = $240",[32,729,731],{"id":730},"redpanda","RedPanda",[339,733,734,736,738,741,744],{},[342,735,703],{},[342,737,706],{},[342,739,740],{},"Interzone traffic - producer to broker: 5GB\u002Fs * ⅔ * $0.02\u002FGB(in+out) * 3600 = $240",[342,742,743],{},"Interzone traffic - replication: 10GB\u002Fs * $0.02\u002FGB(in+out) * 3600 = $720",[342,745,746],{},"Interzone traffic - broker to consumer: $0 (fetch from local zone)",[48,748,749,750,755],{},"Please note that we were unable to test ",[55,751,754],{"href":752,"rel":753},"https:\u002F\u002Fwww.redpanda.com\u002Fblog\u002Fcloud-topics-streaming-data-object-storage",[264],"Redpanda with Cloud Topics",", as it remains an announced but unreleased feature and is not yet available for evaluation. Based on the limited information available, while Cloud Topics may help optimize inter-zone data replication costs, producers still need to traverse inter-availability zones to connect to the topic partition owners and incur inter-zone traffic costs of up to $240 per hour.",[339,757,758,764],{},[342,759,760,763],{},[55,761,647],{"href":645,"rel":762},[264]," (when implemented) will help mitigate producer-to-broker inter-zone traffic, but it is not yet available. And it only works with the default partitioner (no record partition or key is specified).",[342,765,766],{},"Redpanda’s leader pinning helps only when all producers for the pinned topic are confined to a single AZ. In multi-AZ environments (like our benchmark), inter-zone producer traffic remains unavoidable.",[48,768,769],{},"Additionally, Redpanda’s Cloud Topics architecture is not documented publicly. Their blog mentions \"leader placement rules to optimize produce latency and ingress cost,\" but it is unclear whether this represents a shift away from a leader-based architecture or if it uses techniques similar to Ursa’s zone-aware approach.",[48,771,772],{},"We may revisit this comparison as more details become available.",[40,774,776],{"id":775},"comparing-total-cost-of-ownership","Comparing Total Cost of Ownership",[48,778,779],{},"As highlighted earlier, with a BYOC Ursa setup, you can achieve 5 GB\u002Fs throughput at just 5% of the infrastructure cost of a traditional leader-based data streaming engine, such as Kafka or RedPanda, while managing the infrastructure yourself. This significant cost reduction is enabled by Ursa’s leaderless architecture and lakehouse-native storage design, which eliminate overhead costs such as inter-zone traffic and leader-based data replication. By leveraging a lakehouse-native, leaderless architecture, Ursa reduces resource requirements, enabling you to handle high data throughput efficiently and at a fraction of the cost of RedPanda.",[48,781,782],{},"Now, let’s examine the total cost comparison, evaluating Ursa alongside other vendors, including those that have adopted a leaderless architecture (e.g., Confluent WarpStream). This comparison is based on a 5GB\u002Fs workload with a 7-day retention period, factoring in both storage cost and vendor costs Here are the key findings:",[339,784,785,788,791],{},[342,786,787],{},"Ursa ($164,353\u002Fmonth) is: 50% cheaper than Confluent WarpStream ($337,068\u002Fmonth)",[342,789,790],{},"85% cheaper than AWS MSK ($1,115,251\u002Fmonth)",[342,792,793],{},"86% cheaper than Redpanda ($1,202,853\u002Fmonth)",[48,795,796],{},"In addition to Ursa’s architectural advantages—eliminating most inter-AZ traffic and leveraging lakehouse storage for cost-effective data retention—it also adopts a more fair and cost-efficient pricing model: Elastic Throughput-based pricing. This approach aligns costs with actual usage, avoiding unnecessary overhead.",[48,798,799],{},"Unlike WarpStream, which charges for both storage and throughput, Ursa ensures that customers only pay for the throughput they actively use. Ursa’s pricing is based on compressed data sent by clients, meaning the more data compressed on the client side, the lower the cost. In contrast, WarpStream prices are based on uncompressed data, unfairly inflating expenses and failing to incentivize customers to optimize their client applications.",[48,801,802],{},"This distinction is crucial, as compressed data reduces both storage and network costs, making Ursa’s pricing model not only more cost-effective but also more transparent and predictable.",[48,804,805],{},[351,806],{"alt":18,"src":807},"\u002Fimgs\u002Fblogs\u002F679c602d194800c9206d9d58_AD_4nXcFlf755xgyz7htxhMhBV5fGrsxy642mQNodt61DTok_z1dwkw5A6lkO5hatXVneCaB0anbZPAyvLI3MlIMuQEYLEACHHvQMOr5UfaB37dfzkdqewDEvcT-20VGd_zzvJsuA00zGA.png",[48,809,810],{},[351,811],{"alt":18,"src":812},"\u002Fimgs\u002Fblogs\u002F679c62594e9c2e629fae73aa_AD_4nXeU6cOgItnjLsEZCOf13TEvMY_SHWWIxYP2OYUj-B1GUPyWO78OG08K_v03hwYSVcg06f9dqDiGmdwy76vynjmiDGL5bluZ5_XF4nSU_r59oOZdfViXndXt6s11vVOY7qwfZN8v.png",[32,814,816],{"id":815},"cost-breakdown","Cost Breakdown",[818,819,820],"h4",{"id":697},"StreamNative – Ursa",[339,822,823,826,829,832,835],{},[342,824,825],{},"EC2 (Server): 9 × $1.536\u002Fhr × 24 hr × 30 days = $9,953.28",[342,827,828],{},"S3 Write Requests: 1,350 r\u002Fs × $0.005\u002F1,000 r × 3,600 s × 24 hr × 30 days = $17,496",[342,830,831],{},"S3 Read Requests: 1,350 r\u002Fs × $0.0004\u002F1,000 r × 3,600 s × 24 hr × 30 days = $1,400",[342,833,834],{},"S3 Storage Costs: 5 GB\u002Fs × $0.021\u002FGB × 3,600 s × 24 hr × 7 days = $63,504",[342,836,837],{},"Vendor Cost: 200 ETU × $0.50\u002Fhr × 24 hr × 30 days = $72,000",[818,839,841],{"id":840},"warpstream","WarpStream",[339,843,844,847],{},[342,845,846],{},"Based on WarpStream’s pricing calculator (as of January 29, 2025), we assume a 4:1 client data compression ratio, meaning 20 GB\u002Fs of uncompressed data translates to 5 GB\u002Fs of compressed data.",[342,848,849,850,855],{},"It's important to note that WarpStream’s pricing structure has fluctuated frequently throughout January. We observed the cost reported by their calculator changing from $409,644 per month to $337,068 per month. This variability has been previously highlighted in the blog post “",[55,851,854],{"href":852,"rel":853},"https:\u002F\u002Fbigdata.2minutestreaming.com\u002Fp\u002Fthe-brutal-truth-about-apache-kafka-cost-calculators",[264],"The Brutal Truth About Kafka Cost Calculators","”. To ensure transparency, we have documented the pricing as of January 29, 2025.",[48,857,858],{},[351,859],{"alt":18,"src":860},"\u002Fimgs\u002Fblogs\u002F679c602e42713e0028e9af5e_AD_4nXcu5_VWTLu9jRYs6zX1MBAOtLQEo5gyfNSWPcbpnQHXTa8qNCFAXezRR2E8daygzYTTwd4dhJjaLaLM8C6y_3OGbu2NS7pdvEv3a8-ptNKOg7AeKnYqPQCAYvQ5EuxzuI3JYIvY.png",[818,862,864],{"id":863},"msk","MSK",[339,866,867,870,873],{},[342,868,869],{},"EC2 (Server): 15 * $3.264\u002Fhr × 24 hr × 30 days = $35,251",[342,871,872],{},"Interzone Traffic (Client-Server): 5 GB\u002Fs × ⅔ × $0.02\u002FGB (in+out) × 3,600 s × 24 hr × 30 days = $172,800",[342,874,875],{},"Storage: 5 GB\u002Fs × $0.1\u002FGB-month × 3,600 s × 24 hr × 7 days * 3 replicas = $907,200",[818,877,731],{"id":878},"redpanda-1",[339,880,881,884,886,889,892],{},[342,882,883],{},"EC2 (Server): 9 × $1.536\u002Fhr × 24 hr × 30 days = $9953",[342,885,872],{},[342,887,888],{},"Interzone Traffic (Replication): 5 GB\u002Fs × 2 × $0.02\u002FGB (in+out) × 3,600 s × 24 hr × 30 days = $518,400",[342,890,891],{},"Storage: 5 GB\u002Fs × $0.045\u002FGB-month(st1) × 3,600 s × 24 hr × 7 days * 3 replicas = $408,240",[342,893,894],{},"Vendor Cost: $93,333 per month (based on limited information. See additional notes below).",[818,896,898],{"id":897},"additional-notes","Additional Notes",[339,900,901],{},[342,902,903,904,909],{},"Redpanda does not publicly disclose its BYOC pricing, making it difficult to accurately assess its total costs. We refer to information from the whitepaper “",[55,905,908],{"href":906,"rel":907},"https:\u002F\u002Fwww.redpanda.com\u002Fresources\u002Fredpanda-vs-confluent-performance-tco-benchmark-report#form",[264],"Redpanda vs. Confluent: A Performance and TCO Benchmark Report by McKnight Consulting Group.","” for estimation purposes. Based on the Tier-8 pricing model in the whitepaper,  the estimated cost to support a 5GB\u002Fs workload would be $1.12 million per year ($93,333 per month). However, since this calculation is based on an estimation, we will revisit and refine the cost assessment once Redpanda publishes its BYOC pricing.",[48,911,912],{},[351,913],{"alt":18,"src":914},"\u002Fimgs\u002Fblogs\u002F679c602dc8a9859eed89a0ef_AD_4nXdbcO8vsNNPy4GtkNLlmNKf22fjxRvzLzH7CtOna1L08sTbvnZx3HhufeFqc1w4K2gEF7lxO2IR5supotxebAiGnA07Qa8Yr3Rd1pVK2LYKK4WurlJGwgdwwucZIFoF-N_2oBjY.png",[48,916,917],{},[351,918],{"alt":18,"src":919},"\u002Fimgs\u002Fblogs\u002F679c602d6bc1c2287e012540_AD_4nXfcHZnLfjbjIr3ZAgoQXT9dwP3aQCOQPmGZZJUtpNZSwE6qY6M3yehIaBxCwxEIeu5PVdUPY0zhyjnow26YfgjdYgSG4GnV9ibxu0YWTIpwng6z_F6FUGJMpERMKtpsFESzXSN_Sw.png",[339,921,922,925],{},[342,923,924],{},"When estimating the storage costs for Kafka and Redpanda, we assume the use of HDD storage at $0.045\u002FGB, based on the premise that both systems can fully utilize disk bandwidth without incurring the higher costs associated with GP2 or GP3 volumes. However, in practice, many users opt for GP2 or GP3, significantly increasing the total storage cost for Kafka and Redpanda.",[342,926,927],{},"Unlike disk-based solutions, S3 storage does not require capacity preallocation—Ursa only incurs costs for the actual data stored. This contrasts with Kafka and Redpanda, where preallocating storage can drive up expenses. As a result, the real-world storage costs for Kafka and Redpanda are often 50% higher than the estimates above.",[40,929,931],{"id":930},"conclusion","Conclusion",[48,933,934],{},"Ursa represents a transformative shift in streaming data infrastructure, offering cost efficiency, scalability, and flexibility without compromising durability or reliability. By leveraging a leaderless architecture and eliminating inter-zone data replication, Ursa reduces total cost of ownership by over 90% compared to traditional leader-based streaming engines like Kafka and Redpanda. Its direct integration with cloud storage and scalable metadata & index management via Oxia ensure high availability and simplified infrastructure management.",[32,936,938],{"id":937},"balancing-latency-and-cost","Balancing Latency and Cost",[48,940,941,945],{},[55,942,944],{"href":943},"\u002Fblog\u002Fcap-theorem-for-data-streaming","Ursa trades off slightly higher latency for ultra low cost",", making it an ideal choice for the majority of streaming workloads, especially those that prioritize throughput and cost savings over ultra-low latency. Meanwhile, StreamNative’s BookKeeper-based engine remains the preferred solution for real-time, latency-sensitive applications. By combining these two approaches, StreamNative empowers customers with the flexibility to choose the right engine for their specific needs—whether it's maximizing cost savings or achieving ultra low-latency real-time performance.",[32,947,949],{"id":948},"the-future-of-streaming-infrastructure","The Future of Streaming Infrastructure",[48,951,952],{},"In an era where data fuels AI, analytics, and real-time decision-making, managing infrastructure costs is critical to sustaining innovation. Ursa is not just a cost-cutting alternative—it is a forward-thinking, lakehouse-native platform that redefines how modern data streaming infrastructure should be built and operated.",[48,954,955,956,961],{},"Whether your priority is reducing costs, improving flexibility, or ingesting massive data into lakehouses, Ursa delivers a future-proof solution for the evolving demands of real-time data streaming. ",[55,957,960],{"href":958,"rel":959},"https:\u002F\u002Fconsole.streamnative.cloud\u002F",[264],"Get started"," with StreamNative Ursa today!",[963,964,966],"h1",{"id":965},"references","References",[48,968,969,972,973],{},[970,971,430],"span",{}," ",[55,974,975],{"href":975},"\u002Fblog\u002Fintroducing-oxia-scalable-metadata-and-coordination",[48,977,978,972,980],{},[970,979,379],{},[55,981,378],{"href":378},[48,983,984,972,987],{},[970,985,986],{},"StreamNative pricing",[55,988,989],{"href":989,"rel":990},"https:\u002F\u002Fdocs.streamnative.io\u002Fdocs\u002Fbilling-overview",[264],[48,992,993,972,996],{},[970,994,995],{},"WarpStream pricing",[55,997,998],{"href":998,"rel":999},"https:\u002F\u002Fwww.warpstream.com\u002Fpricing#pricingfaqs",[264],[48,1001,1002,972,1005],{},[970,1003,1004],{},"AWS S3 pricing",[55,1006,1007],{"href":1007,"rel":1008},"https:\u002F\u002Faws.amazon.com\u002Fs3\u002Fpricing\u002F",[264],[48,1010,1011,972,1014],{},[970,1012,1013],{},"AWS EBS pricing",[55,1015,1016],{"href":1016,"rel":1017},"https:\u002F\u002Faws.amazon.com\u002Febs\u002Fpricing\u002F",[264],[48,1019,1020,972,1023],{},[970,1021,1022],{},"AWS MSK pricing",[55,1024,1025],{"href":1025,"rel":1026},"https:\u002F\u002Faws.amazon.com\u002Fmsk\u002Fpricing\u002F",[264],[48,1028,1029,972,1032],{},[970,1030,1031],{},"The Brutal Truth about Kafka Cost Calculators",[55,1033,852],{"href":852,"rel":1034},[264],[48,1036,1037,972,1040],{},[970,1038,1039],{},"Redpanda vs. Confluent: A Performance and TCO Benchmark Report by McKnight Consulting Group",[55,1041,906],{"href":906,"rel":1042},[264],{"title":18,"searchDepth":19,"depth":19,"links":1044},[1045,1046,1047,1052,1056,1057,1066,1069],{"id":333,"depth":19,"text":334},{"id":372,"depth":19,"text":373},{"id":397,"depth":19,"text":398,"children":1048},[1049,1050,1051],{"id":409,"depth":279,"text":410},{"id":434,"depth":279,"text":435},{"id":455,"depth":279,"text":456},{"id":479,"depth":19,"text":480,"children":1053},[1054,1055],{"id":483,"depth":279,"text":484},{"id":498,"depth":279,"text":499},{"id":539,"depth":19,"text":540},{"id":551,"depth":19,"text":552,"children":1058},[1059,1060,1061,1062,1063,1064,1065],{"id":558,"depth":279,"text":559},{"id":604,"depth":279,"text":605},{"id":622,"depth":279,"text":623},{"id":669,"depth":279,"text":670},{"id":697,"depth":279,"text":698},{"id":715,"depth":279,"text":716},{"id":730,"depth":279,"text":731},{"id":775,"depth":19,"text":776,"children":1067},[1068],{"id":815,"depth":279,"text":816},{"id":930,"depth":19,"text":931,"children":1070},[1071,1072],{"id":937,"depth":279,"text":938},{"id":948,"depth":279,"text":949},"StreamNative Cloud","2025-01-31","Discover how Ursa achieves 5GB\u002Fs Kafka workloads at just 5% of the cost of traditional streaming engines like Redpanda and AWS MSK. See our benchmark results comparing infrastructure costs, total cost of ownership (TCO), and performance across leading Kafka vendors.","\u002Fimgs\u002Fblogs\u002F679c6593d25099b1cdcec4ca_image-31.png",{},"\u002Fblog\u002Fhow-we-run-a-5-gb-s-kafka-workload-for-just-50-per-hour","30 min",{"title":308,"description":1075},"blog\u002Fhow-we-run-a-5-gb-s-kafka-workload-for-just-50-per-hour",[1083,1084,303],"TCO","Apache Kafka","A0o_2xdJiLI6rf6xj4RKsxJNo_A6QN2fYzCp6gaLrFw",{"id":1087,"title":1088,"authors":1089,"body":1091,"category":1818,"createdAt":290,"date":1819,"description":1820,"extension":8,"featured":294,"image":1821,"isDraft":294,"link":290,"meta":1822,"navigation":7,"order":296,"path":1823,"readingTime":1824,"relatedResources":290,"seo":1825,"stem":1826,"tags":1827,"__hash__":1830},"blogs\u002Fblog\u002Fapache-pulsar-vs-apache-kafka-2022-benchmark.md","Apache Pulsar vs. Apache Kafka 2022 Benchmark",[1090,313],"Matteo Merli",{"type":15,"value":1092,"toc":1800},[1093,1096,1115,1121,1124,1127,1130,1136,1138,1142,1145,1149,1152,1156,1159,1163,1170,1173,1177,1180,1184,1187,1196,1200,1203,1206,1210,1213,1216,1219,1222,1225,1229,1243,1246,1257,1260,1265,1273,1278,1292,1295,1298,1301,1309,1323,1328,1339,1342,1346,1350,1353,1357,1360,1374,1382,1386,1391,1397,1401,1404,1407,1410,1418,1421,1425,1428,1431,1440,1447,1451,1460,1464,1467,1470,1473,1476,1479,1483,1486,1489,1492,1495,1498,1508,1515,1519,1528,1532,1535,1538,1541,1544,1552,1555,1559,1562,1565,1576,1579,1584,1589,1594,1599,1602,1610,1617,1621,1644,1648,1651,1654,1657,1660,1663,1665,1668,1679,1683,1690,1693,1696,1700,1704,1754,1758,1765,1767,1776,1785,1794,1796],[48,1094,1095],{},"The Apache PulsarTM versus Apache KafkaⓇ debate continues. Organizations often make comparisons based on features, capabilities, size of the community, and a number of other metrics of varying importance. This report focuses purely on comparing the technical performance based on benchmark tests.",[48,1097,1098,1099,1103,1104,1109,1110,1114],{},"The last widely published ",[55,1100,1102],{"href":1101},"\u002Fwhitepapers\u002Fbenchmarking-pulsar-vs-kafka","Pulsar versus Kafka benchmark"," was performed in 2020, and a lot has happened since then. In 2021, Pulsar ranked as a ",[55,1105,1108],{"href":1106,"rel":1107},"https:\u002F\u002Fhubs.ly\u002FQ01701DL0",[264],"Top 5 Apache Software Foundation"," project and ",[55,1111,1113],{"href":1112},"\u002Fblog\u002Fpulsar-hits-400th-contributor-passes-kafka-monthly-active-contributors","surpassed Apache Kafka"," in monthly active contributors as shown in the chart below. Pulsar also averaged more monthly active contributors than Kafka for most of the past 18 months.",[48,1116,1117],{},[351,1118],{"alt":1119,"src":1120},"Pulsar vs Kafka result","\u002Fimgs\u002Fblogs\u002F63b3ed65fb095c6d8670d2da_screen-shot-2022-04-07-at-7.51.37-am.png",[48,1122,1123],{},"These contributions led to major performance improvements for Pulsar. To measure the impact of the improvements, the engineering team at StreamNative, led by Matteo Merli, one of the original creators of Apache Pulsar, and Apache Pulsar PMC Chairperson, performed a benchmark study using the Linux Foundation Open Messaging benchmark.",[48,1125,1126],{},"The team measured Pulsar performance in terms of throughput and latency, and then performed the same tests on Kafka. We’ve included the testing framework and details in the report and encourage anyone who is interested in validating the tests to do so.",[48,1128,1129],{},"Let's take a look at three key findings before jumping into the full results.",[48,1131,1132],{},[1133,1134],"binding",{"value":1135},"cta-blog",[40,1137,334],{"id":333},[32,1139,1141],{"id":1140},"_25x-maximum-throughput-compared-to-kafka","2.5x Maximum Throughput Compared to Kafka",[48,1143,1144],{},"Pulsar is able to achieve 2.5 times the maximum throughput compared to Kafka. This is a significant advantage for use cases that ingest and process large volumes of data, such as log analysis, cybersecurity, and sensor data collection. Higher throughput means less hardware, resulting in lower operational costs.",[32,1146,1148],{"id":1147},"_100x-lower-single-digit-publish-latency-than-kafka","100x Lower Single-digit Publish Latency than Kafka",[48,1150,1151],{},"Pulsar provides consistent single-digit publish latency that is 100x lower than Kafka at P99.99 (ms). Low publish latency is important because it enables systems to hand off messages to a message bus quickly. Once a message is published, the data is safe and the \"action\" will be executed.",[32,1153,1155],{"id":1154},"_15x-faster-historical-read-rate-than-kafka","1.5x Faster Historical Read Rate than Kafka",[48,1157,1158],{},"With a historical read rate that is 1.5 times faster than Kafka, applications using Pulsar as their messaging system can catch-up after an unexpected interruption in half the time. Read throughput is critically important for use cases such as Database Migration\u002FReplication where you are feeding data into a system of record.",[40,1160,1162],{"id":1161},"benchmark-tests","Benchmark Tests",[48,1164,1165,1166,1169],{},"Using the Linux Foundation Open Messaging benchmark [",[55,1167,1168],{"href":1101},"1","], we ran the latest versions of Apache Pulsar (2.9.1) and Apache Kafka (3.0.0). To ensure an objective baseline comparison, each test in this Benchmark Report compares Kafka to Pulsar in two scenarios:  Pulsar with Journaling and Pulsar without Journaling.",[48,1171,1172],{},"Pulsar’s default configuration includes Journaling, which offers a higher durability guarantee than Kafka’s default configuration. Pulsar without Journaling provides the same durability guarantees as the default Kafka configuration, which results in an apples-to-apples comparison.",[32,1174,1176],{"id":1175},"i-what-we-tested","I. What We Tested",[48,1178,1179],{},"For this benchmark, we selected a handful of tests to represent common patterns in the messaging and streaming domains and to test the limits of each system:",[818,1181,1183],{"id":1182},"a-maximum-sustainable-throughput","A. Maximum Sustainable Throughput",[48,1185,1186],{},"This test measures the maximum data throughput the system can deliver when consumers are keeping up with the incoming traffic. We ran this test in two scenarios to test the upper boundary performance and to test the cost profile for each system:",[1188,1189,1190,1193],"ol",{},[342,1191,1192],{},"Topic with a single partition. This scenario tests the upper boundary performance for a total-order use case or, in the worst case, where partition keys’ data is skewed. At some scale, the design of a system that relies upon single ordering or handling large amounts of skewed data will need to be reconsidered. Pulsar has the ability to handle situations where total ordering is required at higher scale or large amounts of skew arise.",[342,1194,1195],{},"Topic with 100 partitions. With more partitions to stress available resources, this test illustrates how well a system scales horizontally (by adding more machines) and its cost effectiveness. For example, by modeling the hardware cost per 1GB\u002Fs of traffic, it is easy to derive the cost profile for each system.",[818,1197,1199],{"id":1198},"b-publish-latency-at-a-fixed-throughput","B. Publish Latency at a Fixed Throughput",[48,1201,1202],{},"For this test, we set a fixed rate for the incoming traffic and measured the publish latency profile. Publish latency begins at the moment when a producer tries to publish a message and ends at the moment when it receives confirmation from the brokers that the message is stored and replicated.",[48,1204,1205],{},"In many real-world applications, it is required to guarantee a certain latency SLA (service-level agreement). In particular, this is true in cases where the message is published as the result of some user interaction, or when the user is waiting for the confirmation.",[818,1207,1209],{"id":1208},"c-catch-up-reads-backlog-draining","C. Catch-up Reads \u002F Backlog Draining",[48,1211,1212],{},"One of the primary purposes of a messaging bus is to act as a “buffer” between different applications or systems. When the consumers are not available, or when there are not enough of them, the system accumulates the data.",[48,1214,1215],{},"In these situations, the system must be able to let the consumers drain the backlog of accumulated data and catch up with the newly produced data as fast as possible.",[48,1217,1218],{},"While this catch-up is happening, it is important that there is no impact on the performance of existing producers (in terms of throughput and latency) on the same topic or in other topics that are present in the cluster.",[48,1220,1221],{},"In all the tests, producers and consumers are always running from a dedicated pool of nodes, and all messages contain a 1KB payload. Additionally, in each test, both Pulsar and Kafka are configured to provide two guaranteed copies of each message.",[48,1223,1224],{},"Note: Pulsar also supports message queuing, complex routing, individual and negative acknowledgments, delayed message delivery, and dead-letter-queues (features not available in Kafka). This benchmark does not evaluate these features.",[32,1226,1228],{"id":1227},"ii-how-we-set-up-the-tests","II. How We Set up the Tests",[48,1230,1231,1232,1236,1237,1242],{},"The benchmark uses the Linux Foundation Open Messaging Benchmark suite [",[55,1233,1168],{"href":1234,"rel":1235},"https:\u002F\u002Fopenmessaging.cloud\u002Fdocs\u002Fbenchmarks\u002F?utm_campaign=Benchmarking%20Pulsar%20vs.%20Kafka%202022&utm_source=%20Linux%20Foundation%20Open%20Messaging%20Benchmark%20Link&utm_medium=Benchmark%202022%20Report%20Reference",[264],"]. You can find all deployments, configurations, and workloads in the Open Messaging Benchmark Github repo [",[55,1238,1241],{"href":1239,"rel":1240},"https:\u002F\u002Fgithub.com\u002Fopenmessaging\u002Fbenchmark?utm_campaign=Benchmarking%20Pulsar%20vs.%20Kafka%202022&utm_source=Open%20Messaging%20Benchmark%20Github%20Link&utm_medium=Benchmark%202022%20Report%20Reference",[264],"2","].",[48,1244,1245],{},"The testbed for the OpenMessaging Benchmark is set up as follows:",[1188,1247,1248,1251,1254],{},[342,1249,1250],{},"3 Broker VMs  of type i3en.6xlarge, with 24-cores, 192GB of memory, 25Gbps guaranteed networking, and two NVMe SSD devices that support up to 1GB\u002Fs write throughput on each disk.",[342,1252,1253],{},"4 Client (producers and consumers) VMs  of type m5n.8xlarge, with 32-cores and with 25Gbps of guaranteed networking throughput and 128GB of memory to ensure the bottleneck would not be on the client-side.",[342,1255,1256],{},"ZooKeeper VMs of type t2.small. These are not critical because ZooKeeper is not stressed in any form during the benchmark execution.",[48,1258,1259],{},"We tested two configurations for Pulsar:",[1188,1261,1262],{},[342,1263,1264],{},"Pulsar with Journaling (Default):",[339,1266,1267,1270],{},[342,1268,1269],{},"Uses a journal for strong durability (this exceeds the durability provided by Kafka).",[342,1271,1272],{},"Replicates and f-syncs data on disk before acknowledging producers.",[1188,1274,1275],{},[342,1276,1277],{},"Pulsar without Journaling:",[339,1279,1280,1283,1286,1289],{},[342,1281,1282],{},"Replicates data in memory on multiple nodes, before acknowledging producers, and then flushes to disk in the background.",[342,1284,1285],{},"Provides the same durability guarantees as Kafka.",[342,1287,1288],{},"Achieves higher throughput and lower latency when compared to the default Pulsar setup with journaling.",[342,1290,1291],{},"Provides a cost-effective alternative to the standard Pulsar setup, at the expense of strong durability. (“Strong durability” means that the data is flushed to disk before an acknowledgement is returned.)",[48,1293,1294],{},"We configured Apache Pulsar 2.9.1 to run with the 3\u002F3\u002F2 persistence policy, which writes entries to 3 storage nodes and waits for 2 confirmations. We are deploying 1 broker and 1 bookie for each of the 3 VMs we are using.",[48,1296,1297],{},"We used Apache Kafka 3.0.0 and the configuration recommended by Confluent in its fork of the OpenMessaging benchmark.",[48,1299,1300],{},"Details on the Kafka configurations include:",[1188,1302,1303,1306],{},[342,1304,1305],{},"Uses in-memory replication (using the OS page-cache) but it’s not guaranteed to be on disk when a producer is acknowledged.",[342,1307,1308],{},"Uses the recommended Confluent setup to increase the throughput compared to the defaults:",[339,1310,1311,1314,1317,1320],{},[342,1312,1313],{},"num.replica.fetchers=8",[342,1315,1316],{},"message.max.bytes=10485760",[342,1318,1319],{},"replica.fetch.max.bytes=10485760",[342,1321,1322],{},"num.network.threads=8",[1188,1324,1325],{"start":279},[342,1326,1327],{},"Uses Producers settings to ensure a minimum replication factor of 2:",[339,1329,1330,1333,1336],{},[342,1331,1332],{},"acks=all",[342,1334,1335],{},"replicationFactor=3",[342,1337,1338],{},"min.insync.replicas=2",[48,1340,1341],{},"Note: For both Kafka and Pulsar, the clients were configured to use ZGC to get lower GC pause time.",[32,1343,1345],{"id":1344},"iii-benchmark-tests-results","III. Benchmark Tests & Results",[818,1347,1349],{"id":1348},"a-test-1-maximum-throughput","A. Test #1:  Maximum Throughput",[48,1351,1352],{},"This test measures the maximum “sustainable throughput” reachable on a topic. Eg: The max throughput that is able to push from producers through consumers, without accumulating any backlog.",[225,1354,1356],{"id":1355},"_1-test-1-case-1-maximum-throughput-with-1-partition","1. Test #1 \u002F Case #1: Maximum Throughput with 1 Partition",[48,1358,1359],{},"This first test uses a topic with a single partition to establish the boundary for ingesting data in a totally ordered way. This is common in all the use case scenarios where a single history of all the events in a precise order is required, such as “change data capture” or event sourcing.",[48,1361,1362,1363,1368,1369],{},"Driver files: ",[55,1364,1367],{"href":1365,"rel":1366},"https:\u002F\u002Fgithub.com\u002Fopenmessaging\u002Fbenchmark\u002Fblob\u002Fmaster\u002Fdriver-pulsar\u002Fpulsar.yaml",[264],"pulsar.yaml",", ",[55,1370,1373],{"href":1371,"rel":1372},"https:\u002F\u002Fgithub.com\u002Fopenmessaging\u002Fbenchmark\u002Fblob\u002Fmaster\u002Fdriver-kafka\u002Fkafka-throughput.yaml",[264],"kafka-throughput.yaml ",[48,1375,1376,1377],{},"Workload file: ",[55,1378,1381],{"href":1379,"rel":1380},"https:\u002F\u002Fgithub.com\u002Fopenmessaging\u002Fbenchmark\u002Fblob\u002Fmaster\u002Fworkloads\u002Fmax-rate-1-topic-1-partition-4p-1c-1kb.yaml",[264],"max-rate-1-topic-1-partition-4p-1c-1kb.yaml",[225,1383,1385],{"id":1384},"a-case-1-results-maximum-throughput-with-1-partition","a. Case #1 Results: Maximum Throughput with 1 Partition",[48,1387,1388],{},[351,1389],{"alt":18,"src":1390},"\u002Fimgs\u002Fblogs\u002F63c71ab1a1ca8b3201e7d469_swCzXmWwN5hXgyKExG4ay8JBL1S7o7YzU8hDSTx_YlS1Ef7i5JWo8AcCyjY6Uo5vMRVOZEdoj13LfKls1xBGoqKLkqzFK20QiTdIlmAzirPjo1-NiVRgmGO0KUt4echv9JBBEolNcsXyPyREPlZiDMDltg52oLAwtOav6EW9UwKp0pB38Lk95vTP2e9K.png",[48,1392,1393,1396],{},[351,1394],{"alt":18,"src":1395},"\u002Fimgs\u002Fblogs\u002F63c73ad5f47e238fa299b754_figure-2-table.png","Figure 2: Single partition max write throughput (MB\u002Fs): Higher is better.",[225,1398,1400],{"id":1399},"b-case-1-analysis","b. Case #1 Analysis",[48,1402,1403],{},"The difference in throughput between Pulsar and Kafka reflects how efficiently each system is able to “pipeline” data across the different components from producers to brokers, and then the data replication protocol of each system.",[48,1405,1406],{},"Pulsar achieves a throughput of 700 MB\u002Fs and 580 MB\u002Fs, respectively, on the single partitions, compared to Kafka’s 280 MB\u002Fs. This is possible because the Pulsar client library combines messages into batches when sending them to the brokers. The brokers then pipeline data to the storage nodes.",[48,1408,1409],{},"In Kafka, two factors impose a bottleneck on the maximum achievable throughput: (1) the producer default limit of 5 maximum outstanding batches; and  (2) the producer buffer size (batch.size=1048576) recommended by Confluent for high throughput.",[48,1411,1412,1413,1417],{},"Note: Increasing the batch.size setting has negative effects on the latency. This is not the case for Pulsar producers, where the batching latency is controlled by the ",[1414,1415,1416],"code",{},"batchingMaxDelay()"," setting, in addition to the batch max size.",[48,1419,1420],{},"With the increase in single topic throughput, Pulsar provides developers and architects more options in how they build their system. Teams can worry less about finding optimal partition keys and focus instead on mapping their data into streams.",[225,1422,1424],{"id":1423},"_2-test-1-case-2-maximum-throughput-with-100-partitions","2. Test #1 \u002F Case #2: Maximum Throughput with 100 Partitions",[48,1426,1427],{},"Most use cases that involve a significant amount of real-time data use partitioning to avoid the bottleneck of a single node. Partitioning is a way for messaging systems to divide a single topic into smaller chunks that can be assigned to different brokers.",[48,1429,1430],{},"Given that we tested on a 3-nodes cluster, we used 100 partitions to maximize the throughput of the system across the nodes. There is no advantage to using a higher number of partitions on this cluster because the partitions are handled independently and spread uniformly across the available brokers.",[48,1432,1433,1434,1368,1437],{},"Driver file: ",[55,1435,1367],{"href":1365,"rel":1436},[264],[55,1438,1373],{"href":1371,"rel":1439},[264],[48,1441,1376,1442],{},[55,1443,1446],{"href":1444,"rel":1445},"https:\u002F\u002Fgithub.com\u002Fopenmessaging\u002Fbenchmark\u002Fblob\u002Fmaster\u002Fworkloads\u002F1-topic-100-partitions-1kb-4p-4c-2000k.yaml",[264],"1-topic-100-partitions-1kb-4p-4c-2000k.yaml",[225,1448,1450],{"id":1449},"a-case-2-results-maximum-throughput-with-100-partitions","a. Case #2 Results: Maximum Throughput with 100 Partitions",[48,1452,1453,1456,1459],{},[351,1454],{"alt":18,"src":1455},"\u002Fimgs\u002Fblogs\u002F63c71ab1aaacb6f5ecad1b50_SWJaDwgVnLYGckeUhJnwVDTu1vSvZfQ2pqc8-WBP2QfdKIkydqSyT3RBQBNF6WIvQwL_0OM1k6U0vpia7q4VD269rFXqLlXdlDxkwdw3-lOyRU5CFpOZFXxv-HivbuRjK42gxOToo5DfMcrepufOfMwc_BdLQRNH3Mnsdrfq4fiWHosNq1POqyMVe76v.png",[351,1457],{"alt":18,"src":1458},"\u002Fimgs\u002Fblogs\u002F63c73b6d3d155a4ae5b6f20c_figure-3-table.png","Figure 3: 100 partitions max write throughput (MB\u002Fs): Higher is better.",[225,1461,1463],{"id":1462},"b-case-2-analysis","b. Case #2 Analysis",[48,1465,1466],{},"Pulsar without Journaling achieves a throughput of 1600 (MB\u002Fs), Kafka achieves a throughput of 1087 (MB\u002Fs) and Pulsar with Journaling (Default) achieves a throughput of 800 (MB\u002Fs). At equivalent durability guarantees Pulsar is able to outperform Kafka in terms of maximum write throughput. The difference in performance stems from how Kafka implements access to the disk. Kafka stores data for each partition in different directories and files, resulting in more files open for writing and scattering the IO operations across the disk. This increases the stress and contention on the OS page caching system that Kafka relies on.",[48,1468,1469],{},"When reading a file, the OS tries to cache blocks of data in the available system RAM. When the data is not available in the OS cache, the thread is blocked while the data is read from the disk and pulled in the cache.",[48,1471,1472],{},"The cost of pulling the blocked data into the cache is a significant delay (~100s of milliseconds) in serving write\u002Fread requests for other topics. This delay is observed in the benchmark results in the form of the publish latency experienced by the producers.",[48,1474,1475],{},"In the case of the default Pulsar deployment (with a journal for strong durability), the throughput is lower because 1 disk (out of 2 available in the VMs) is dedicated to the journal. Therefore we are capping the available IO bandwidth. In a production environment, this cap could be mitigated by having more disks to increase the IOPS\u002Fnode capacity, but for this benchmark we used the same VM resources for each of the system configurations.",[48,1477,1478],{},"The difference in throughput can impact the cost of the solution. With parity of guarantees, this test shows that Pulsar would require 32% less hardware compared to Kafka for the same amount of traffic.",[818,1480,1482],{"id":1481},"b-test-2-publish-latency","B. Test #2:  Publish Latency",[48,1484,1485],{},"The purpose of this test is to measure the latency perceived by the producers at a steady state, with a fixed publish rate.",[48,1487,1488],{},"Messaging systems are often used in applications where data must efficiently and reliably be moved from a producing application to be durably stored in the messaging system. In high volume scenarios, even momentary increases in latency can result in memory resources being exhausted. In other situations, a human user may be “in-the-loop” and waiting on an operation which publishes a message - for example, a web page needs the confirmation of the action before proceeding - and latency spikes can degrade the user experience. In these use cases, it is important to have a latency performance profile that is consistently within a given SLA (service-level agreement).",[48,1490,1491],{},"It is also important to consider that a high latency in the long tail (eg: 99.9 percentile and above) will still have an outsized impact over an SLA that can be offered by an application. In practical terms, a higher 99.9% latency in the producer will often result in a significantly higher 99% latency for the application request.",[48,1493,1494],{},"Because the messaging bus sits at the bottom of the stack, it needs to provide a low and consistent latency profile so that applications can provide their own latency SLAs.",[48,1496,1497],{},"This test is conducted by publishing and consuming at a fixed rate of 500 MB\u002Fs and comparing it to the publish latency seen by producers.",[48,1499,1433,1500,1368,1503],{},[55,1501,1367],{"href":1365,"rel":1502},[264],[55,1504,1507],{"href":1505,"rel":1506},"https:\u002F\u002Fgithub.com\u002Fopenmessaging\u002Fbenchmark\u002Fblob\u002Fmaster\u002Fdriver-kafka\u002Fkafka-latency.yaml",[264],"kafka-latency.yaml ",[48,1509,1376,1510],{},[55,1511,1514],{"href":1512,"rel":1513},"https:\u002F\u002Fgithub.com\u002Fopenmessaging\u002Fbenchmark\u002Fblob\u002Fmaster\u002Fworkloads\u002F1-topic-100-partitions-1kb-4p-4c-500k.yaml",[264],"1-topic-100-partitions-1kb-4p-4c-500k.yaml",[225,1516,1518],{"id":1517},"a-test-2-results-publish-latency","a. Test #2 Results: Publish Latency",[48,1520,1521,1524,1527],{},[351,1522],{"alt":18,"src":1523},"\u002Fimgs\u002Fblogs\u002F63c71ab1a1ca8b854fe7d468_MCUf-xMXk9i4GST8unRDS1C5AoCBLtHiEfyiIQ320_FUKIeP4K8urFfhEv-TDFxSPoUuvWvDRmdvWiUKJvy_pyxHui9h1CM84FAhXcBle8zq1cmq25qkheT_EmDeHulx2UBXiSQzaVYOoReLM1c9JgprXdWsV8-1Cb--HapmjH1VHWIYtPPHF6OYbXO2.png",[351,1525],{"alt":18,"src":1526},"\u002Fimgs\u002Fblogs\u002F63c73a3a2753b445eb5fee87_figure-4-table.png","Figure 4: 500K Rate publish latency percentiles (ms): Lower is better.",[225,1529,1531],{"id":1530},"b-test-2-analysis","b. Test #2 Analysis",[48,1533,1534],{},"In this test, Pulsar is able to maintain a low publish latency while sustaining a high per-node utilization. Pulsar without Journaling is able to sustain 1.58 milliseconds latency at the 99 percentile and Pulsar with Journaling is able to sustain 7.89 milliseconds.",[48,1536,1537],{},"Kafka maintains a low publish latency up to the 99 percentile, where it is able to sustain 3.46 milliseconds in latency. But at 99.9%, Kafka’s latency spikes to 54.56 ms.",[48,1539,1540],{},"Publishing at a fixed rate, below the max burst throughput, at 99.9% and above, Pulsar has lower latency than Kafka for both Pulsar with Journaling (default) and the Pulsar without Journaling.",[48,1542,1543],{},"The reasons for lower latency with Pulsar are:",[1188,1545,1546,1549],{},[342,1547,1548],{},"When running Pulsar without Journaling, the critical data write path is decoupled from the disk access so it is not susceptible to the noise introduced by IO operations. The data is guaranteed to only be copied in memory, (unlike OS page cache which blocks under high load situations,) and then is flushed by background threads.",[342,1550,1551],{},"Pulsar with Journaling (Default) is able to maintain low latency because the BookKeeper replication protocol is able to ignore the slowest responding storage node. Due to the internal disk garbage collection mechanism, the performance profile of SSD and NVMe disks is characterized by good average write latency but with periodic latency spikes of up to 100 milliseconds. BookKeeper is able to smooth out the latency when used in 3\u002F3\u002F2 configuration, because it only waits for the two fast storage nodes for each entry.",[48,1553,1554],{},"By contrast, Kafka replication protocol is set to wait for all three of the brokers that are in the in-replica-set. Because of that, unless a broker crashes or is falling behind the leader for more than 30 seconds, each entry in Kafka needs to wait for all three brokers to have the entry.",[818,1556,1558],{"id":1557},"c-test-3-catch-up-reads","C. Test #3:  Catch-up Reads",[48,1560,1561],{},"In the consumer catch-up test, we build a backlog of data and then start the consumers. While the consumers catch-up, the writers continue publishing data at the same rate.",[48,1563,1564],{},"This is a common, real-life scenario for a messaging\u002Fstreaming system. Below are a few common use cases:",[1188,1566,1567,1570,1573],{},[342,1568,1569],{},"Consumers come back online after a few hours of downtime and try to catch-up.",[342,1571,1572],{},"New consumers get bootstrapped and replay the data in the topic.",[342,1574,1575],{},"Periodic batch jobs that scan and process the historical data stored in the topic.",[48,1577,1578],{},"With this test, we can measure the following:",[1188,1580,1581],{},[342,1582,1583],{},"The catch-up speed.",[339,1585,1586],{},[342,1587,1588],{},"Consuming applications want to be able to recover as fast as possible, draining all the pending backlog and catching up with the producers in the shortest time.",[1188,1590,1591],{"start":19},[342,1592,1593],{},"The ability to avoid performance degradation and isolate workloads.",[339,1595,1596],{},[342,1597,1598],{},"Producing applications need to be decoupled and isolated from consuming applications and also from different, unrelated topics in the same cluster.",[48,1600,1601],{},"The size of the backlog is 512 GBs. It is larger than the RAM available in the nodes in order to simulate the case where the entire data does not fit in cache and the storage systems are forced to read from disk.",[48,1603,1433,1604,1368,1607],{},[55,1605,1367],{"href":1365,"rel":1606},[264],[55,1608,1507],{"href":1505,"rel":1609},[264],[48,1611,1376,1612],{},[55,1613,1616],{"href":1614,"rel":1615},"https:\u002F\u002Fgithub.com\u002Fopenmessaging\u002Fbenchmark\u002Fblob\u002Fmaster\u002Fworkloads\u002F1-topic-100-partitions-1kb-4p-4c-200k-backlog.yaml",[264],"1-topic-100-partitions-1kb-4p-4c-200k-backlog.yaml",[225,1618,1620],{"id":1619},"a-test-3-results-catch-up-reads","a. Test #3 Results: Catch-up Reads",[48,1622,1623,1626,1629,1630,1633,1636,1637,1640,1643],{},[351,1624],{"alt":18,"src":1625},"\u002Fimgs\u002Fblogs\u002F63c71ab15ed199fbb1d6e088_Fvef71g8AHCQAbbo6Uo-1Wv9iGMbP9nxd1nnDndi8bYNpYt8dYOuVy5XATUl0wO4UaOX3wYzlIvWjBQbK-kd7X1-rHWti2QdQku7AfFcUGGZKuStYq7eO2_42r5tsdFi4Z3a_H3_ccu0K9XFb1o3LzASHvzK5aeKg5AYZ_H8vyfQlsePegBX34w79NYv.png",[351,1627],{"alt":18,"src":1628},"\u002Fimgs\u002Fblogs\u002F63c73d2c5ecd19269dfb2aec_figure-5a-table.png","Figure 5a: Catch-up read throughput (msg\u002Fs): Higher is better.",[351,1631],{"alt":18,"src":1632},"\u002Fimgs\u002Fblogs\u002F63c71ab225436150ccf4fd11_BgqqL7qDd8JC9zjC87183t6d2y6-iUGF0rBJey9vyzsvhpyp8vPctxWhSq9MbsOm2UixgQAfjm1cjv3iDSMiEibCPMUVyHcaPBGvOwAISevM0BlhEgEPW8lsUiE6XEeu3gMVEeG8gUhnrMEOIAcRpAV43jROuT85hRbGbKGDQ9YBQh_jkgYPLt0UcxkW.png",[351,1634],{"alt":18,"src":1635},"\u002Fimgs\u002Fblogs\u002F63c73d8d187a390bc382b477_figure-5b-table.png","Figure 5b: Catch-up read chase time (seconds): Shorter is better.",[351,1638],{"alt":18,"src":1639},"\u002Fimgs\u002Fblogs\u002F63c71ab1c37fd1acad9a0bcc_q7zKC60ZrjFQbUSYvSodbtz88-VKxy5JcapxW7CENWDfQmS2v7P47Jo4jDqChoMrPUqU7CQlWje6t6XM9mXAL13HEeDPiPPcp-LjWA3DfAsULd-bdcogG2Z9jJlyq45GpZrwHrGVlXysHtCYI9MZgFwgp3LYIfjkXPkbNxpFy8EyXKeUPagQPVAPlJar.png",[351,1641],{"alt":18,"src":1642},"\u002Fimgs\u002Fblogs\u002F63c73db681d346249886ddd7_figure-5c-table.png","Figure 5c: Impact publish latency during catchup read (ms): Lower is better.",[225,1645,1647],{"id":1646},"b-test-3-analysis","b. Test #3 Analysis",[48,1649,1650],{},"The test shows that Pulsar consumers are able to drain the backlog of data ~2.5x faster than Kafka consumers, without impacting the performance of the connected producers.",[48,1652,1653],{},"With Kafka, the test showed that while the consumers are catching up, the producers are heavily impacted, with 99% latencies up to ~700 milliseconds and consequent throughput reductions.",[48,1655,1656],{},"The increase in latency is caused by the contention on the OS page cache used by Kafka. When the size of the backlog of data exceeds the RAM available in the Kafka broker, the OS will start to evict pages from the cache. This causes page cache misses that stop the Kafka threads. When there are enough producers and consumers in a broker, it becomes easy to end up in a “cache-thrashing” scenario, where time is spent paging data in from the disk and evicting it from the cache soon after.",[48,1658,1659],{},"In contrast, Pulsar with BookKeeper adopts a more sophisticated approach to write and read operations. Pulsar does not rely on the OS page cache because BookKeeper has its own set of write and read caches, for which the eviction and pre-fetching are specifically designed for streaming storage use cases.",[48,1661,1662],{},"This test demonstrates the degradation that consumers can cause in a Kafka cluster. This impacts the performance of the Kafka cluster and can lead to reliability problems.",[40,1664,931],{"id":930},[48,1666,1667],{},"The benchmark demonstrates Apache Pulsar’s ability to provide high performance across a broad range of use cases. In particular, Pulsar provides better and more predictable performance, even for the use cases that are generally associated with Kafka, such as large volume streaming data over partitioned topics. Key highlights on the Pulsar versus Kafka performance comparison include:",[1188,1669,1670,1673,1676],{},[342,1671,1672],{},"Pulsar provides 99pct write latency \u003C1.6ms without journal, and \u003C8ms with journal for fixed 500MB\u002Fs write throughput. The latency profile does not degrade at the higher quantiles, while Kafka latency quickly spikes up to 100s of milliseconds.",[342,1674,1675],{},"Pulsar can prove up to 3.2 GB\u002Fs historical data read throughput, 60% more than Kafka which can only achieve 2.0 GB\u002Fs.",[342,1677,1678],{},"During historical data reading, Pulsar’s I\u002FO isolation provides a low  and consistent publish latency, 2 orders of magnitude lower than Kafka. This ensures that the real-time data stream will not be affected when reading historical data.",[32,1680,1682],{"id":1681},"pulsar-unified-messaging-streaming-and-the-future","Pulsar: Unified Messaging & Streaming, and the Future",[48,1684,1685,1686,190],{},"While Pulsar is often adopted for streaming use cases, it also provides a superset of features and is widely adopted for message queuing use cases and for use cases that require unified messaging and streaming capabilities. This benchmark did not cover the message queuing capabilities of Pulsar, but you can learn more in the Pulsar Launches 2.8.0, Unified Messaging and Streaming ",[55,1687,1689],{"href":1688},"\u002Fblog\u002Fapache-pulsar-launches-2-8-unified-messaging-streaming-transactions","blog",[48,1691,1692],{},"Beyond the development of Pulsar’s capabilities, the Pulsar ecosystem continues to expand. Protocol handlers allow for Pulsar brokers to natively communicate via other protocols, such as Kafka and RabbitMQ, enabling teams to easily integrate existing applications with Pulsar. Integrations with Apache Pinot, Delta Lake, Apache Spark, and Apache Flink have allowed teams to make Pulsar the ideal choice to help teams use one technology across both the data and application tiers.",[48,1694,1695],{},"For more on Pulsar, check out the resources below.",[48,1697,1698],{},[1133,1699],{"value":1135},[32,1701,1703],{"id":1702},"want-to-learn-more","Want to Learn More?",[1188,1705,1706,1713,1720,1726,1739,1746],{},[342,1707,1708,1709,190],{},"To learn more about how Pulsar compares to Kafka, visit this ",[55,1710,1712],{"href":1711},"\u002Fpulsar\u002Fpulsar-vs-kafka","page",[342,1714,1715,1716,1719],{},"Read this ",[55,1717,1689],{"href":1718},"\u002Fblog\u002Funderstanding-pulsar-10-minutes-guide-kafka-users"," to bootstrap yourknowledge by translating your existing Apache Kafka experience.",[342,1721,1722,1723,190],{},"To learn more about Apache Pulsar use cases, check out this ",[55,1724,1712],{"href":1725},"\u002Fcontent-type-filtring-system\u002Fsuccess-stories",[342,1727,1728,1729,1733,1734],{},"Interested in spinning up a Pulsar cluster in minutes using StreamNative Cloud? ",[55,1730,1732],{"href":1731},"\u002Fthank\u002Fcontact-us","Contact us"," today. ",[55,1735,1738],{"href":1736,"rel":1737},"https:\u002F\u002Fhubs.ly\u002FQ016_Wgd0",[264],"‍",[342,1740,1741,1745],{},[55,1742,1744],{"href":1736,"rel":1743},[264],"Sign up"," for the monthly StreamNative Newsletter for Apache Pulsar.",[342,1747,1748,1753],{},[55,1749,1752],{"href":1750,"rel":1751},"https:\u002F\u002Fwww.academy.streamnative.io\u002F",[264],"Learn Pulsar"," from the original creators of Pulsar. Watch on-demand videos, enroll in self-paced courses, and complete our certification program to demonstrate your Pulsar knowledge.",[32,1755,1757],{"id":1756},"about-streamnative","About StreamNative",[48,1759,1760,1761,190],{},"Founded by the original creators of Apache Pulsar, the StreamNative team has more experience deploying and running Pulsar than any company in the world. StreamNative offers a cloud-native, scalable, resilient, and secure messaging and event streaming solution powered by Apache Pulsar. With StreamNative Cloud, you get a fully-managed Apache-Pulsar-as-a-Service offering available in our cloud or yours. Learn more at ",[55,1762,1764],{"href":1763},"\u002Fabout","Streamnative.io",[32,1766,966],{"id":965},[48,1768,1769,972,1771],{},[970,1770,1168],{},[55,1772,1775],{"href":1773,"rel":1774},"https:\u002F\u002Fhubs.ly\u002FQ016_P830",[264],"The Linux Foundation Open Messaging Benchmark suite",[48,1777,1778,972,1780],{},[970,1779,1241],{},[55,1781,1784],{"href":1782,"rel":1783},"https:\u002F\u002Fhubs.ly\u002FQ016_PcP0",[264],"The Open Messaging Benchmark Github repo",[48,1786,1787,972,1790],{},[970,1788,1789],{},"3",[55,1791,1793],{"href":1792},"\u002Fblog\u002Fperspective-on-pulsars-performance-compared-to-kafka","A More Accurate Perspective on Pulsar’s Performance",[48,1795,1738],{},[48,1797,1798],{},[1133,1799],{"value":1135},{"title":18,"searchDepth":19,"depth":19,"links":1801},[1802,1807,1812],{"id":333,"depth":19,"text":334,"children":1803},[1804,1805,1806],{"id":1140,"depth":279,"text":1141},{"id":1147,"depth":279,"text":1148},{"id":1154,"depth":279,"text":1155},{"id":1161,"depth":19,"text":1162,"children":1808},[1809,1810,1811],{"id":1175,"depth":279,"text":1176},{"id":1227,"depth":279,"text":1228},{"id":1344,"depth":279,"text":1345},{"id":930,"depth":19,"text":931,"children":1813},[1814,1815,1816,1817],{"id":1681,"depth":279,"text":1682},{"id":1702,"depth":279,"text":1703},{"id":1756,"depth":279,"text":1757},{"id":965,"depth":279,"text":966},"Apache Pulsar","2022-04-07","This Apache Pulsar versus Apache Kafka report focuses purely on comparing the technical performance based on benchmark tests.","\u002Fimgs\u002Fblogs\u002F63c7fa9e7c4ed1b41436347f_63b3ed64316977607d62783a_blog_cover_pulsar_vs_kafka.png",{},"\u002Fblog\u002Fapache-pulsar-vs-apache-kafka-2022-benchmark","12 min read",{"title":1088,"description":1820},"blog\u002Fapache-pulsar-vs-apache-kafka-2022-benchmark",[1084,1818,1828,1829],"Benchmarks","Observability","lZD3D6Fx3oahLRZEB_N8Wg9iqTd7Yx0U7UpF3ujauk0",[1832,1847],{"id":1833,"title":1090,"bioSummary":1834,"email":290,"extension":8,"image":1835,"linkedinUrl":1836,"meta":1837,"position":1844,"stem":1845,"twitterUrl":290,"__hash__":1846},"authors\u002Fauthors\u002Fmatteo-merli.md","Matteo is the CTO at StreamNative, where he brings rich experience in distributed pub-sub messaging platforms. Matteo was one of the co-creators of Apache Pulsar during his time at Yahoo!. Matteo worked to create a global, distributed messaging system for Yahoo!, which would later become Apache Pulsar. Matteo is the PMC Chair of Apache Pulsar, where he helps to guide the community and ensure the success of the Pulsar project. He is also a PMC member for Apache BookKeeper. Matteo lives in Menlo Park, California.","\u002Fimgs\u002Fauthors\u002Fmatteo-merli.webp","https:\u002F\u002Fwww.linkedin.com\u002Fin\u002Fmatteomerli\u002F",{"body":1838},{"type":15,"value":1839,"toc":1842},[1840],[48,1841,1834],{},{"title":18,"searchDepth":19,"depth":19,"links":1843},[],"CTO, StreamNative & Co-Creator and PMC Chair Apache Pulsar","authors\u002Fmatteo-merli","MRLEjDgpe8SqHBoftSh_eiNGg-1oCJ30t7iV3Bb2NzQ",{"id":1848,"title":313,"bioSummary":1849,"email":290,"extension":8,"image":1850,"linkedinUrl":1851,"meta":1852,"position":1859,"stem":1860,"twitterUrl":1861,"__hash__":1862},"authors\u002Fauthors\u002Fpenghui-li.md","Penghui Li is passionate about helping organizations to architect and implement messaging services. Prior to StreamNative, Penghui was a Software Engineer at Zhaopin.com, where he was the leading Pulsar advocate and helped the company adopt and implement the technology. He is an Apache Pulsar Committer and PMC member.","\u002Fimgs\u002Fauthors\u002Fpenghui-li.webp","https:\u002F\u002Fwww.linkedin.com\u002Fin\u002Fpenghui-li-244173184\u002F",{"body":1853},{"type":15,"value":1854,"toc":1857},[1855],[48,1856,1849],{},{"title":18,"searchDepth":19,"depth":19,"links":1858},[],"Director of Streaming, StreamNative & Apache Pulsar PMC Member","authors\u002Fpenghui-li","https:\u002F\u002Ftwitter.com\u002Flipenghui6","WDjET7GfxqVQJ8mTEMaRhgpxRdDy18qZkgQDJlwjvbI",[1864,1871,1878],{"path":1865,"title":1088,"date":1866,"image":1867,"link":-1,"collection":1868,"resourceType":1869,"score":1870,"id":1865},"\u002Freports\u002Fapache-pulsar-vs-apache-kafka-2022-benchmark","2022-12-23","\u002Fimgs\u002Fwhitepapers\u002F63bd307e8c4076bac9f1b090_pulsar-vs-kafka.png","reports","Report",1.1,{"path":1872,"title":1873,"date":1874,"image":-1,"link":-1,"collection":1875,"resourceType":1876,"score":1877,"id":1872},"\u002Fblog\u002Fbenchmarking-pulsar-and-kafka-report-2020","Benchmarking Pulsar and Kafka - The Full Benchmark Report - 2020","2020-11-09","blogs","Blog",0.75,{"path":1101,"title":1879,"date":1866,"image":1880,"link":-1,"collection":1881,"resourceType":1882,"score":1883,"id":1101},"Benchmarking Pulsar and Kafka: A More Accurate Perspective on Pulsar's Performance","\u002Fimgs\u002Fwhitepapers\u002F63aecd39801cf457af016505_open-graph-benchmark-pulsar-vs-kafka.webp","whitepapers","Whitepaper",0.66,1775716416950]