[{"data":1,"prerenderedAt":1744},["ShallowReactive",2],{"active-banner":3,"navbar-featured-partner-blog":24,"navbar-pricing-featured":306,"blog-\u002Fblog\u002Ffrom-streams-to-lakestreams":1086,"blog-authors-\u002Fblog\u002Ffrom-streams-to-lakestreams":1616,"related-\u002Fblog\u002Ffrom-streams-to-lakestreams":1726},{"id":4,"title":5,"date":6,"dismissible":7,"extension":8,"link":9,"link2":10,"linkText":11,"linkText2":12,"meta":13,"stem":21,"variant":22,"__hash__":23},"banners\u002Fbanners\u002Flakestream-ufk-launch.md","StreamNative Introduces Lakestream Architecture and Launches Native Kafka Service","2026-04-07",true,"md","\u002Fblog\u002Ffrom-streams-to-lakestreams","https:\u002F\u002Fconsole.streamnative.cloud\u002Fsignup?from=banner_lakestream-launch","Read Announcement","Sign Up Now",{"body":14},{"type":15,"value":16,"toc":17},"minimark",[],{"title":18,"searchDepth":19,"depth":19,"links":20},"",2,[],"banners\u002Flakestream-ufk-launch","default","zRueBGutATZB0ZnFFHwaEV7F0Di4tnZUHhgOiI4cu6k",{"id":25,"title":26,"authors":27,"body":29,"canonicalUrl":289,"category":290,"createdAt":289,"date":291,"description":292,"extension":8,"featured":7,"image":293,"isDraft":294,"link":289,"meta":295,"navigation":7,"order":296,"path":297,"readingTime":298,"relatedResources":289,"seo":299,"stem":300,"tags":301,"__hash__":305},"blogs\u002Fblog\u002Fstreamnative-recognized-in-the-forrester-wave-streaming-data-platforms-2025.md","StreamNative Recognized as a Contender in The Forrester Wave™: Streaming Data Platforms, Q4 2025",[28],"David Kjerrumgaard",{"type":15,"value":30,"toc":276},[31,39,47,51,67,73,78,81,87,102,109,115,118,124,127,134,140,143,146,157,163,169,172,175,178,184,191,194,197,204,207,210,224,229,233,237,241,245,249,251,268,270],[32,33,35],"h3",{"id":34},"receives-highest-possible-scores-in-both-the-messaging-and-resource-optimization-criteria",[36,37,38],"em",{},"Receives Highest Possible Scores in BOTH the Messaging and Resource Optimization Criteria",[40,41,43],"h2",{"id":42},"introduction",[44,45,46],"strong",{},"Introduction",[48,49,50],"p",{},"Real-time data has become the backbone of modern innovation. As artificial intelligence (AI) and digital services demand instantaneous insights, organizations are realizing that streaming data is no longer optional – it's essential for delivering timely, context-rich experiences. StreamNative's data streaming platform is built precisely for this reality, ensuring data is immediate, reliable, and ready to power critical applications.",[48,52,53,54,63,64],{},"Today, we're excited to announce that Forrester Research has named StreamNative as a Contender in its evaluation, ",[55,56,58],"a",{"href":57},"\u002Freports\u002Frecognized-in-the-forrester-wave-tm-streaming-data-platforms-q4-2025",[36,59,60],{},[44,61,62],{},"The Forrester Wave™: Streaming Data Platforms, Q4 2025",". This report evaluated 15 top streaming data platform providers, and we're proud to share that ",[44,65,66],{},"StreamNative received the highest scores possible—5 out of 5—in both the Messaging and Resource Optimization criteria.",[48,68,69,70],{},"***Forrester's Take: ***",[36,71,72],{},"\"StreamNative is a good fit for enterprises that want an Apache Pulsar implementation that is also compatible with Kafka APIs.\"",[48,74,75],{},[36,76,77],{},"— The Forrester Wave™: Streaming Data Platforms, Q4 2025",[48,79,80],{},"Being recognized in the Forrester Wave is a proud milestone, and for us, it highlights how far StreamNative has come in enabling enterprises to unlock the power of real-time data. In the sections below, we'll dive into what we believe sets StreamNative apart—from our modern architecture and cloud-native design to our open-source foundation and real-time use cases—and how we see these strengths aligning with Forrester's findings.",[40,82,84],{"id":83},"trusted-by-industry-leaders",[44,85,86],{},"Trusted by Industry Leaders",[48,88,89,90,93,94,97,98,101],{},"Companies across industries are already leveraging StreamNative to drive real-time outcomes. Global enterprises like ",[44,91,92],{},"Cisco"," rely on StreamNative to handle massive IoT telemetry, supporting 245 million+ connected devices. Martech leaders such as ",[44,95,96],{},"Iterable"," process billions of events per day with StreamNative for hyper-personalized customer engagement. And in financial services, ",[44,99,100],{},"FICO"," trusts StreamNative to power its real-time fraud detection and analytics pipelines with a secure, scalable streaming backbone.",[48,103,104,105,108],{},"The Forrester report notes that, “",[36,106,107],{},"Customers appreciate the lower infrastructure costs that result from StreamNative’s cost-efficient, Kafka-compatible architecture. Customers note excellent support responsiveness…","”",[40,110,112],{"id":111},"modern-cloud-native-architecture-built-for-scale",[44,113,114],{},"Modern, Cloud-Native Architecture Built for Scale",[48,116,117],{},"From day one, StreamNative was designed with a modern architecture to meet the demanding scale and flexibility requirements of real-time data. Unlike legacy streaming systems that often rely on tightly coupled storage and compute, StreamNative's platform takes a cloud-native approach: it decouples these layers to enable elastic scalability and efficient resource utilization across any environment. The core is powered by Apache Pulsar—a distributed messaging and streaming engine—enhanced with multi-protocol support (including native Apache Kafka API compatibility) to unify diverse data streams under one roof. This means organizations can consolidate siloed messaging systems and handle both high-volume event streams and traditional message queues on a single platform, without sacrificing performance or reliability.",[48,119,120,121,108],{},"Forrester's evaluation described that “",[36,122,123],{},"StreamNative aims to provide a high-performance, multi-protocol streaming data platform: It uses Apache Pulsar with Kafka API compatibility to deliver cost-efficient, real-time applications for enterprises. It appeals to organizations that want a flexible, low-cost streaming solution, due to its focus on scalability and resource optimization, while its investments in Pulsar’s open-source ecosystem and performance optimization make it the primary platform for enterprises wishing to implement Pulsar.",[48,125,126],{},"Our cloud-first, leaderless architecture (with no single broker bottlenecks) and tiered storage model were built to maximize throughput and cost-efficiency for real-time workloads. By separating compute from storage and leveraging distributed object storage, StreamNative can retain huge volumes of event data indefinitely while keeping compute costs in check—effectively providing a flexible, low-cost streaming solution.",[48,128,129,130,133],{},"This modern design not only delivers high performance, but also ensures fault tolerance and geo-distribution out of the box, so enterprises can trust their streaming data is always available and durable. As Forrester’s evaluation noted, StreamNative ",[36,131,132],{},"\"excels at messaging and resource optimization\" and “Its platform supports use cases like real-time analytics and event-driven architectures with robust scalability.","” Our architecture provides the strong foundation that today's real-time applications demand, from ultra-fast data ingestion to seamless scale-out across hybrid and multi-cloud environments.",[40,135,137],{"id":136},"open-source-foundation-and-pulsar-expertise",[44,138,139],{},"Open Source Foundation and Pulsar Expertise",[48,141,142],{},"StreamNative's DNA is rooted in open source innovation. Our founders are the original creators of Apache Pulsar, and we've built our platform with the same open principles: freedom, flexibility, and community-driven innovation. For developers and data teams, this means adopting StreamNative comes with no proprietary lock-in—instead, you get a platform built on open standards and a thriving ecosystem. We offer broad API compatibility (Pulsar, Kafka, JMS, MQTT, and more) so that teams can work with familiar interfaces and integrate StreamNative into existing systems with ease.",[48,144,145],{},"StreamNative is the primary commercial contributor to the Apache Pulsar project and its surrounding ecosystem. We invest heavily in Pulsar's ongoing improvements our investments in Pulsar's open-source ecosystem and performance optimization bolster StreamNative's value. We also foster a vibrant community through initiatives like the Data Streaming Summit and free training resources.",[48,147,148,149,152,153,156],{},"Forrester's assessment noted that StreamNative’s “",[36,150,151],{},"events-driven agents, extensibility, and performance architecture are solid,","” and we're continuing to build on that foundation. ",[44,154,155],{},"We're actively investing in expanding our tooling for observability, governance, schema management, and developer productivity","—areas we recognize as critical for enterprise adoption and where we're committed to accelerating our roadmap.",[48,158,159,160],{},"Being open also means embracing an open ecosystem of technologies. StreamNative actively integrates with the tools and platforms that matter most to our users. We partner with industry leaders like Snowflake, Databricks, Google, and Ververica to ensure our streaming platform works seamlessly with data warehouses, lakehouse storage, and stream processing frameworks. Forrester’s evaluation observed that StreamNative’s ",[36,161,162],{},"\"investments in Pulsar’s open-source ecosystem and performance optimization make it the primary platform for enterprises wishing to implement Pulsar.\"",[40,164,166],{"id":165},"powering-real-time-use-cases-across-industries",[44,167,168],{},"Powering Real-Time Use Cases Across Industries",[48,170,171],{},"One of the greatest validations of StreamNative's approach is the success our customers are achieving with real-time data. StreamNative's platform is versatile and use-case agnostic—if an application demands high-volume, low-latency data movement, we can power it. This flexibility is why our customer base spans industries from finance and IoT to major automobile manufacturers and online gaming. The common thread is that these organizations need to process and react to data in milliseconds, and StreamNative is delivering the capabilities to make that possible.",[48,173,174],{},"Cisco uses StreamNative to underpin an IoT telemetry system of colossal scale, connecting hundreds of millions of devices and thousands of enterprise clients with real-time data streams. The platform's multi-tenant design and proven reliability allow Cisco to offer its customers a live feed of device data with unwavering confidence. In the financial sector, FICO has built streaming pipelines on StreamNative to detect fraud as transactions happen and to monitor systems in real time. With StreamNative's strong guarantees around message durability and ordering, FICO can catch anomalies or suspicious patterns within seconds. And in digital customer engagement, Iterable relies on StreamNative to process billions of events every day—clicks, views, purchases—so that marketers can trigger personalized campaigns instantly based on user behavior.",[48,176,177],{},"Our customers uniformly deal with mission-critical data streams, where downtime or delays are unacceptable. StreamNative's fault-tolerant, scalable infrastructure has proven equal to the task, handling scenarios like bursting to millions of events per second or seamlessly spanning multiple cloud regions. Forrester's report recognized StreamNative for supporting event-driven architectures with robust scalability—which for us is a reflection of our platform's ability to meet the most demanding enterprise requirements.",[40,179,181],{"id":180},"continuing-to-innovate-ursa-orca-and-the-road-ahead",[44,182,183],{},"Continuing to Innovate: Ursa, Orca, and the Road Ahead",[48,185,186,187,190],{},"While we are thrilled to be recognized in Forrester's Streaming Data Platforms Wave, we view this as just the beginning. StreamNative's vision has always been bold: to ",[44,188,189],{},"provide a unified platform that not only handles today's streaming needs but also anticipates the emerging requirements of tomorrow",".",[48,192,193],{},"One key area of focus is the convergence of streaming data with advanced analytics and AI. As Forrester points out in the report, technology leaders should look for platforms that natively integrate messaging, stream processing, and analytics to provide AI agents with real-time, contextualized information. We couldn't agree more. Our award-winning Ursa Engine and Orca Agent Engine are aimed at extending our platform up the stack—bridging the gap between data streams and data lakes, and between event streams and intelligent processing.",[48,195,196],{},"Our new Ursa Engine introduces a lakehouse-native approach to streaming: it can write events directly to table formats like Iceberg on cloud storage, eliminating entire classes of ETL jobs and making fresh data instantly available for analytics queries. By integrating streaming and lakehouse technologies, we help customers collapse data silos and accelerate their AI\u002FML pipelines.",[48,198,199,200,203],{},"Beyond analytics integration, we are also enhancing StreamNative with more out-of-the-box processing and governance capabilities. In the coming months, we plan to introduce new features for lightweight stream processing and transformation, making it easier to build reactive applications directly on the platform. We're also expanding our ecosystem of connectors and integrations, so that whether your data lands in Snowflake, Databricks, or an AI model, StreamNative will seamlessly feed it. ",[44,201,202],{},"We're investing significantly in enterprise features including security, schema registry, governance, and monitoring tooling","—capabilities that are essential for mission-critical deployments and where we're committed to continued improvement.",[48,205,206],{},"This recognition from Forrester energizes us to keep innovating at full speed. We're sharing this honor with our amazing customers, community, and partners who drive us forward every day. Your feedback and real-world challenges have helped shape StreamNative into what it is today, and together, we will shape the future of streaming data. Thank you for joining us on this journey—we're just getting started, and we can't wait to deliver even more value as we continue to evolve our platform. Onward to real-time everything!",[208,209],"hr",{},[32,211,213],{"id":212},"streamnative-in-the-forrester-wave-evaluation-findings",[44,214,215,216,223],{},"StreamNative in ",[44,217,218],{},[55,219,220],{"href":57},[44,221,222],{},"The Forrester Wave™",": Evaluation Findings",[225,226,228],"h5",{"id":227},"recognized-as-a-contender-among-15-streaming-data-platform-providers","• Recognized as a Contender among 15 streaming data platform providers",[225,230,232],{"id":231},"received-the-highest-scores-possible-50-in-both-the-messaging-and-resource-optimization-criteria","* Received the highest scores possible (5.0) in both the Messaging and Resource Optimization criteria",[225,234,236],{"id":235},"cited-as-the-primary-platform-for-enterprises-wishing-to-implement-pulsar","• Cited as the primary platform for enterprises wishing to implement Pulsar",[225,238,240],{"id":239},"noted-for-excelling-at-messaging-and-resource-optimization","• Noted for excelling at messaging and resource optimization",[225,242,244],{"id":243},"customers-cited-lower-infrastructure-costs-and-excellent-support-responsiveness","• Customers cited lower infrastructure costs and excellent support responsiveness",[225,246,248],{"id":247},"recognized-for-supporting-event-driven-architectures-with-robust-scalability","• Recognized for supporting event-driven architectures with robust scalability",[208,250],{},[252,253,255,256,259,260,190],"h6",{"id":254},"forrester-disclaimer-forrester-does-not-endorse-any-company-product-brand-or-service-included-in-its-research-publications-and-does-not-advise-any-person-to-select-the-products-or-services-of-any-company-or-brand-based-on-the-ratings-included-in-such-publications-information-is-based-on-the-best-available-resources-opinions-reflect-judgment-at-the-time-and-are-subject-to-change-for-more-information-read-about-forresters-objectivity-here","**Forrester Disclaimer: **",[36,257,258],{},"Forrester does not endorse any company, product, brand, or service included in its research publications and does not advise any person to select the products or services of any company or brand based on the ratings included in such publications. Information is based on the best available resources. Opinions reflect judgment at the time and are subject to change",". *For more information, read about Forrester’s objectivity *",[55,261,265],{"href":262,"rel":263},"https:\u002F\u002Fwww.forrester.com\u002Fabout-us\u002Fobjectivity\u002F",[264],"nofollow",[36,266,267],{},"here",[208,269],{},[252,271,273],{"id":272},"apache-apache-pulsar-apache-kafka-apache-flink-and-other-names-are-trademarks-of-the-apache-software-foundation-no-endorsement-by-apache-or-other-third-parties-is-implied",[36,274,275],{},"Apache®, Apache Pulsar®, Apache Kafka®, Apache Flink® and other names are trademarks of The Apache Software Foundation. No endorsement by Apache or other third parties is implied.",{"title":18,"searchDepth":19,"depth":19,"links":277},[278,280,281,282,283,284,285],{"id":34,"depth":279,"text":38},3,{"id":42,"depth":19,"text":46},{"id":83,"depth":19,"text":86},{"id":111,"depth":19,"text":114},{"id":136,"depth":19,"text":139},{"id":165,"depth":19,"text":168},{"id":180,"depth":19,"text":183,"children":286},[287],{"id":212,"depth":279,"text":288},"StreamNative in The Forrester Wave™: Evaluation Findings",null,"Company","2025-12-16","StreamNative is recognized in The Forrester Wave™: Streaming Data Platforms, Q4 2025. Discover why Forrester highlights StreamNative's high-performance messaging, efficient resource use, and cost-effective Kafka API compatibility for real-time innovation.","\u002Fimgs\u002Fblogs\u002F693bd36cf01b217dcb67278f_Streamnative_blog_thumbnail.png",false,{},0,"\u002Fblog\u002Fstreamnative-recognized-in-the-forrester-wave-streaming-data-platforms-2025","10 mins read",{"title":26,"description":292},"blog\u002Fstreamnative-recognized-in-the-forrester-wave-streaming-data-platforms-2025",[302,303,304],"Announcements","Real-Time","Forrester","5Nr1vAcqlQ7yFQfdL0a3MLsNFerVmEOQJXD9Twz5lx8",{"id":307,"title":308,"authors":309,"body":314,"canonicalUrl":289,"category":1073,"createdAt":289,"date":1074,"description":1075,"extension":8,"featured":7,"image":1076,"isDraft":294,"link":289,"meta":1077,"navigation":7,"order":296,"path":1078,"readingTime":1079,"relatedResources":289,"seo":1080,"stem":1081,"tags":1082,"__hash__":1085},"blogs\u002Fblog\u002Fhow-we-run-a-5-gb-s-kafka-workload-for-just-50-per-hour.md","How We Run a 5 GB\u002Fs Kafka Workload for Just $50 per Hour",[310,311,312,313],"Matteo Meril","Neng Lu","Hang Chen","Penghui Li",{"type":15,"value":315,"toc":1043},[316,319,322,325,328,331,335,338,348,354,357,365,370,374,381,384,387,395,399,402,407,411,414,417,420,423,432,436,439,450,453,457,460,463,474,477,481,485,493,496,500,508,537,541,544,549,553,556,560,563,566,571,580,585,588,591,602,606,609,620,624,627,630,635,638,667,671,673,679,682,687,692,695,699,713,717,728,732,747,756,767,770,773,777,780,783,794,797,800,803,808,813,817,821,838,842,856,861,865,876,879,895,899,910,915,920,928,932,935,939,946,950,953,962,967,976,982,991,1000,1009,1018,1027,1035],[48,317,318],{},"The rise of DeepSeek has shaken the AI infrastructure market, forcing companies to confront the escalating costs of training and deploying AI models. But the real pressure point isn’t just compute—it’s data acquisition and ingestion costs.",[48,320,321],{},"As businesses rethink their AI cost-containment strategies, real-time data streaming is emerging as a critical enabler. The growing adoption of Kafka as a standard protocol has expanded cost-efficient options, allowing companies to optimize streaming analytics while keeping expenses in check.",[48,323,324],{},"Ursa, the data streaming engine powering StreamNative’s managed Kafka service, is built for this new reality. With its leaderless architecture and native lakehouse storage integration, Ursa eliminates costly inter-zone network traffic for data replication and client-to-broker communication while ensuring high availability at minimal operational cost.",[48,326,327],{},"In this blog post, we benchmarked the infrastructure cost and total cost of ownership (TCO) for running a 5GB\u002Fs Kafka workload across different Kafka vendors, including Redpanda, Confluent WarpStream, and AWS MSK. Our benchmark results show that Ursa can sustain 5GB\u002Fs Kafka workloads at just 5% of the cost of traditional streaming engines like Redpanda—making it the ideal solution for high-performance, cost-efficient ingestion and data streaming for data lakehouses and AI workloads.",[48,329,330],{},"Note: We also evaluated vanilla Kafka in our benchmark; however, for simplicity, we have focused our cost comparison on vendor solutions rather than self-managed deployments. That said, it is important to highlight that both Redpanda and vanilla Kafka use a leader-based data replication approach. In a data-intensive, network-bound workload like 5GB\u002Fs streaming, with the same machine type and replication factor, Redpanda and vanilla Kafka produced nearly identical cost profiles.",[40,332,334],{"id":333},"key-benchmark-findings","Key Benchmark Findings",[48,336,337],{},"Ursa delivered 5 GB\u002Fs of sustained throughput at an infrastructure cost of just $54 per hour. For comparison:",[339,340,341,345],"ul",{},[342,343,344],"li",{},"MSK: $303 per hour → 5.6x more expensive compared to Ursa",[342,346,347],{},"Redpanda: $988 per hour → 18x more expensive compared to Ursa",[48,349,350],{},[351,352],"img",{"alt":18,"src":353},"\u002Fimgs\u002Fblogs\u002F679c71b67d9046f26edc7977_AD_4nXfvTqyBNUBu2lObdkKAx-5UNkpNP8UYULLZyOcixE6z99VMZUUEsUqWjzexI7vjyNGRNSAUoM9smYvdTP55ctAhIbrs5lmQgcSVMWdaoigbWouCl95DVSQsxooY-qqfGcYqS4g4zA.png",[48,355,356],{},"Beyond infrastructure costs, when factoring in both storage pricing, vendor pricing and operational expenses, Ursa’s total cost of ownership (TCO) for a 5GB\u002Fs workload with a 7-day retention period is:",[339,358,359,362],{},[342,360,361],{},"50% cheaper than Confluent WarpStream",[342,363,364],{},"85% cheaper than MSK and Redpanda",[48,366,367],{},[351,368],{"alt":18,"src":369},"\u002Fimgs\u002Fblogs\u002F679c602d77e9c706de5343b8_AD_4nXeDv8rrv_C1CTCCiqYo1zpvlGYbdBk1r0VEqovAPu22iFMQZgh54Hfw9PBMLzM7jDFxKwAFDxbdG0np4XVk_tGsWhEKMloLRcmmea7lvueCx-0cFsyaE3Mya4Mxc1Dox95A6JEc.png",[40,371,373],{"id":372},"ursa-highly-cost-efficient-data-streaming-at-scale","Ursa: Highly Cost-Efficient Data Streaming at Scale",[48,375,376,380],{},[55,377,379],{"href":378},"\u002Fblog\u002Fursa-reimagine-apache-kafka-for-the-cost-conscious-data-streaming","Ursa"," is a next-generation data streaming engine designed to deliver high performance at a fraction of the cost of traditional disk-based solutions. It is fully compatible with Apache Kafka and Apache Pulsar APIs, while leveraging a leaderless, lakehouse-native architecture to maximize scalability, efficiency, and cost savings.",[48,382,383],{},"Ursa’s key innovation is separating storage from compute and decoupling metadata\u002Findex operations from data operations by utilizing cloud object storage (e.g., AWS S3) instead of costly inter-zone disk-based replication. It also employs open lakehouse formats (Iceberg and Delta Lake), enabling columnar compression to significantly reduce storage costs while maintaining durability and availability.",[48,385,386],{},"In contrast, traditional streaming systems—like Kafka and Redpanda—depend on leader-based architectures, which drive up inter-zone traffic costs due to replication and client communication. Ursa mitigates these costs by:",[339,388,389,392],{},[342,390,391],{},"Eliminating inter-zone traffic costs via a leaderless architecture.",[342,393,394],{},"Replacing costly inter-zone replication with direct writes to cloud storage using open lakehouse formats.",[40,396,398],{"id":397},"how-ursa-eliminates-inter-zone-traffic","How Ursa Eliminates Inter-Zone Traffic",[48,400,401],{},"Ursa minimizes inter-zone traffic by leveraging a leaderless architecture, which eliminates inter-zone communication between clients and brokers, and lakehouse-native storage, which removes the need for inter-zone data replication. This approach ensures high availability and scalability while avoiding unnecessary cross-zone data movement.",[48,403,404],{},[351,405],{"alt":18,"src":406},"\u002Fimgs\u002Fblogs\u002F679c602e21b3571bb7117dca_AD_4nXd7Oahc77NjRLNvA9clLt0tsyU6MrIqVibFYv5pW5giTIcCHPr3EA_yTGzfVEUIVO3VXK56qWK8zmBCp5lY0E_4nmlWIPFrHjtHylA5NhwELjn-UB0fLG2h_kbrxrc7Cs_edvveNA.png",[32,408,410],{"id":409},"leaderless-architecture","Leaderless architecture",[48,412,413],{},"Traditional streaming engines such as Kafka, Pulsar, or RedPanda rely on a leader-based model, where each partition is assigned to a single leader broker that handles all writes and reads.",[48,415,416],{},"Pros of Leader-Based Architectures:\n✔ Maintains message ordering via local sequence IDs\n✔ Delivers low latency and high performance through message caching",[48,418,419],{},"Cons of Leader-Based Architectures:\n✖ Throughput bottlenecked by a single broker per partition\n✖ Inter-zone traffic required for high availability in multi-AZ deployments",[48,421,422],{},"While Kafka and Pulsar offer partial solutions (e.g., reading from followers, shadow topics) to reduce read-related inter-zone traffic, producers still send data to a single leader.",[48,424,425,426,431],{},"Ursa removes the concept of topic ownership, allowing any broker in the cluster to handle reads or writes for any partition. The primary challenge—ensuring message ordering—is solved with ",[55,427,430],{"href":428,"rel":429},"https:\u002F\u002Fgithub.com\u002Fstreamnative\u002Foxia",[264],"Oxia",", a scalable metadata and index service created by StreamNative in 2022.",[32,433,435],{"id":434},"oxia-the-metadata-layer-enabling-leaderless-architecture","Oxia: The Metadata Layer Enabling Leaderless Architecture",[48,437,438],{},"Ensuring message ordering in a leaderless architecture is complex, but Ursa solves this with Oxia:",[339,440,441,444,447],{},[342,442,443],{},"Handles millions of metadata\u002Findex operations per second",[342,445,446],{},"Generates sequential IDs to maintain strict message ordering",[342,448,449],{},"Optimized for Kubernetes with horizontal scalability",[48,451,452],{},"Producers and consumers can connect to any broker within their local AZ, eliminating inter-zone traffic costs while maintaining performance through localized caching.",[32,454,456],{"id":455},"zero-interzone-data-replication","Zero interzone data replication",[48,458,459],{},"In most distributed systems, data replication from a leader (primary) to followers (replicas) is crucial for fault tolerance and availability. However, replication across zones can inflate infrastructure expenses substantially.",[48,461,462],{},"Ursa avoids these costs by writing data directly to cloud storage (e.g., AWS S3, Google GCS):",[339,464,465,468,471],{},[342,466,467],{},"Built-In Resilience: Cloud storage inherently offers high availability and fault tolerance without inter-zone traffic fees.",[342,469,470],{},"Tradeoff: Slightly higher latency (sub-second, with p99 at 500 milliseconds) compared to local disk\u002FEBS (single-digit to sub-100 milliseconds), in exchange for significantly lower costs (up to 10x lower).",[342,472,473],{},"Flexible Modes: Ursa is an addition to the classic BookKeeper-based engine, providing users with the flexibility to optimize for either cost or low latency based on their workload requirements.",[48,475,476],{},"By foregoing conventional replication, Ursa slashes inter-zone traffic costs and associated complexities—making it a compelling option for organizations seeking to balance high-performance data streaming with strict budget constraints.",[40,478,480],{"id":479},"how-we-ran-a-5-gbs-test-with-ursa","How We Ran a 5 GB\u002Fs Test with Ursa",[32,482,484],{"id":483},"ursa-cluster-deployment","Ursa Cluster Deployment",[339,486,487,490],{},[342,488,489],{},"9 brokers across 3 availability zones, each on m6i.8xlarge (Fixed 12.5 Gbps bandwidth, 32 vCPU cores, 128 GB memory).",[342,491,492],{},"Oxia cluster (metadata store) with 3 nodes of m6i.8xlarge, distributed across three availability zones (AZs).",[48,494,495],{},"During peak throughput (5 GB\u002Fs), each broker’s network usage was about 10 Gbps.",[32,497,499],{"id":498},"openmessaging-benchmark-workers-configuration","OpenMessaging Benchmark Workers & Configuration",[48,501,502,503,507],{},"The OpenMessaging Benchmark(OMB) Framework is a suite of tools that make it easy to benchmark distributed messaging systems in the cloud. Please check ",[55,504,505],{"href":505,"rel":506},"https:\u002F\u002Fopenmessaging.cloud\u002Fdocs\u002Fbenchmarks\u002F",[264]," for details.",[339,509,510,525,534],{},[342,511,512,513,518,519,524],{},"12 OMB workers: 6 for ",[55,514,517],{"href":515,"rel":516},"https:\u002F\u002Fgist.github.com\u002Fcodelipenghui\u002Fd1094122270775e4f1580947f80c5055",[264],"producers",", 6 for ",[55,520,523],{"href":521,"rel":522},"https:\u002F\u002Fgist.github.com\u002Fcodelipenghui\u002F06bada89381fb77a7862e1b4c1d8963d",[264],"consumers"," across 3 availability zones, on m6i.8xlarge instances. Each worker is configured with 12 CPU cores and 48 GB memory.",[342,526,527,528,533],{},"Sample YAML ",[55,529,532],{"href":530,"rel":531},"https:\u002F\u002Fgist.github.com\u002Fcodelipenghui\u002F204c1f26c4d44a218ae235bf2de99904",[264],"scripts"," provided for Kafka-compatible configuration and rate limits.",[342,535,536],{},"Achieved consistent 5 GB\u002Fs publish\u002Fsubscribe throughput.",[40,538,540],{"id":539},"ursa-benchmark-tests-results","Ursa Benchmark Tests & Results",[48,542,543],{},"The following diagram demonstrates that Ursa can consistently handle 5 GB\u002Fs of traffic, fully saturating the network across all broker nodes.",[48,545,546],{},[351,547],{"alt":18,"src":548},"\u002Fimgs\u002Fblogs\u002F679c602d7b261bac1113f7d6_AD_4nXdDPsRc3koXICiFF0bqSmGWbJt_RlUy4FE3ruuWOfbCfpcqZ1dejjqGbkaCJv2hQFL1nirRouBVRW2l5uMWBvY9naMqGB_wHcLI14dBM0f85TXhmdm3UxEv1yGX9Y4hf5FttSkZew.png",[40,550,552],{"id":551},"comparing-infrastructure-cost","Comparing Infrastructure Cost",[48,554,555],{},"This benchmark first evaluates infrastructure costs of running a 5 GB\u002Fs streaming workload (1:1 producer-to-consumer ratio) across different data streaming engines, including Ursa, Redpanda, and AWS MSK, with a focus on multi-AZ deployments to ensure a fair comparison.",[32,557,559],{"id":558},"test-setup-key-assumptions","Test Setup & Key Assumptions",[48,561,562],{},"All tests use multi-AZ configurations, with clusters and clients distributed across three AWS availability zones (AZs). Cluster size scales proportionally to the number of AZs, and rack-awareness is enabled for all engines to evenly distribute topic partitions and leaders.",[48,564,565],{},"To ensure a fair comparison, we selected the same machine type capable of fully utilizing both network and storage bandwidth for Ursa and Redpanda in this 5GB\u002Fs test:",[339,567,568],{},[342,569,570],{},"9 × m6i.8xlarge instances",[48,572,573,574,579],{},"However, MSK's storage bandwidth limits vary depending on the selected instance type, with the highest allowed limit capped at 1000 MiB\u002Fs per broker, according to",[55,575,578],{"href":576,"rel":577},"https:\u002F\u002Fdocs.aws.amazon.com\u002Fmsk\u002Flatest\u002Fdeveloperguide\u002Fmsk-provision-throughput-management.html#throughput-bottlenecks",[264]," AWS documentation",". Given this constraint, achieving 5 GB\u002Fs throughput with a replication factor of 3 required the following setup:",[339,581,582],{},[342,583,584],{},"15 × kafka.m7g.8xlarge (32 vCPUs, 128 GB memory, 15 Gbps network, 4000 GiB EBS).",[48,586,587],{},"This configuration was necessary to work around MSK's storage bandwidth limitations, ensuring a comparable cost basis to other evaluated streaming engines.",[48,589,590],{},"Additional key assumptions include:",[339,592,593,596,599],{},[342,594,595],{},"Inter-AZ producer traffic: For leader-based engines, two-thirds of producer-to-broker traffic crosses AZs due to leader distribution.",[342,597,598],{},"Consumer optimizations: Follower fetch is enabled across all tests, eliminating inter-AZ consumer traffic.",[342,600,601],{},"Storage cost exclusions: This benchmark only evaluates streaming costs, assuming no long-term data retention.",[32,603,605],{"id":604},"inter-broker-replication-costs","Inter-Broker Replication Costs",[48,607,608],{},"Inter-broker (cross-AZ) replication is a major cost driver for data streaming engines:",[339,610,611,614,617],{},[342,612,613],{},"RedPanda: Inter-broker replication is not free, leading to substantial costs when data must be copied across multiple availability zones.",[342,615,616],{},"AWS MSK: Inter-broker replication is free, but MSK instance pricing is significantly higher (e.g., $3.264 per hour for kafka.m7g.8xlarge vs $1.306 per hour for an on-demand m7g.8xlarge). The storage price of MSK is $0.10 per GB-month which is significantly higher than st1, which costs $0.045 per GB-month. Even though replication is free, client-to-broker traffic still incurs inter-AZ charges.",[342,618,619],{},"Ursa: No inter-broker replication costs due to its leaderless architecture, eliminating inter-zone replication costs entirely.",[32,621,623],{"id":622},"zone-affinity-reducing-inter-az-costs","Zone Affinity: Reducing Inter-AZ Costs",[48,625,626],{},"We evaluated zone affinity mechanisms to further reduce inter-AZ data transfer costs.",[48,628,629],{},"Consumers:",[339,631,632],{},[342,633,634],{},"Follower fetch is enabled across all tests, ensuring consumers fetch data from replicas in their local AZ—eliminating inter-zone consumer traffic except for metadata lookups",[48,636,637],{},"Producers:",[339,639,640,649,658],{},[342,641,642,643,648],{},"Kafka protocol lacks an easy way to enforce producer AZ affinity (though ",[55,644,647],{"href":645,"rel":646},"https:\u002F\u002Fcwiki.apache.org\u002Fconfluence\u002Fdisplay\u002FKAFKA\u002FKIP-1123:+Rack-aware+partitioning+for+Kafka+Producer",[264],"KIP-1123"," aims to address this). And it only works with the default partitioner (i.e., when no record partition or record key is specified).",[342,650,651,652,657],{},"Redpanda recently introduced ",[55,653,656],{"href":654,"rel":655},"https:\u002F\u002Fdocs.redpanda.com\u002Fredpanda-cloud\u002Fdevelop\u002Fproduce-data\u002Fleader-pinning\u002F",[264],"leader pinning",", but this only benefits setups where producers are confined to a single AZ—not applicable to our multi-AZ benchmark.",[342,659,660,661,666],{},"Ursa is the only system in this test with ",[55,662,665],{"href":663,"rel":664},"https:\u002F\u002Fdocs.streamnative.io\u002Fdocs\u002Fconfig-kafka-client#eliminate-cross-az-networking-traffic",[264],"built-in zone affinity for both producers and consumers",". It achieves this by embedding producer AZ information in client.id, allowing metadata lookups to route clients to local-AZ brokers, eliminating inter-AZ producer traffic.",[32,668,670],{"id":669},"cost-comparison-results","Cost Comparison Results",[48,672,337],{},[339,674,675,677],{},[342,676,344],{},[342,678,347],{},[48,680,681],{},"Ursa’s leaderless architecture, zone affinity, and native cloud storage integration deliver unparalleled cost efficiency, making it the most cost-effective choice for high-throughput data streaming workloads.",[48,683,684],{},[351,685],{"alt":18,"src":686},"\u002Fimgs\u002Fblogs\u002F679c72208198ca36a352f228_AD_4nXeeZuM8T-xBlD4Vf3j67K618n08qh8wIDLLtiLJG0ssA1Wj1V26u7wIDTX9sqLrtw8mB2c299dwzarGen62CG0Vh7nWstn5qbPGFcBaKJYEepTsLr5fHWv1U8uqbg8Y0UOK6fJ7.png",[48,688,689],{},[351,690],{"alt":18,"src":691},"\u002Fimgs\u002Fblogs\u002F679c625978031f40229de484_AD_4nXdLkLLJ30KKr-_A_rN1j8akVwBYacAWIPzWHoOReJF421890kfByZoQQxkLczihVSmiw5Q9J51-V9I2SEKITbwsYnANDDTlAVL5nQ_jfaHNTe9VEWhSoa7DZooCnilDYL6l6msmJg.png",[48,693,694],{},"The detailed infrastructure cost calculations for each data streaming engine are listed below:",[32,696,698],{"id":697},"streamnative-ursa","StreamNative - Ursa",[339,700,701,704,707,710],{},[342,702,703],{},"Server EC2 costs: 9 * $1.536\u002Fhr = $14",[342,705,706],{},"Client EC2 costs: 9 * $1.536\u002Fhr =$14",[342,708,709],{},"S3 write requests costs: 1350 r\u002Fs * $0.005\u002F1000r * 3600s = $24",[342,711,712],{},"S3 read requests costs: 1350 r\u002Fs * $0.0004\u002F1000r * 3600s = $2",[32,714,716],{"id":715},"aws-msk","AWS MSK",[339,718,719,722,725],{},[342,720,721],{},"Server EC2 costs: 15 * $3.264\u002Fhr = $49",[342,723,724],{},"Client side EC2 costs: 9 * $1.536\u002Fhr =$14",[342,726,727],{},"Interzone traffic - producer to broker: 5GB\u002Fs * ⅔ * $0.02\u002FG(in+out) * 3600 = $240",[32,729,731],{"id":730},"redpanda","RedPanda",[339,733,734,736,738,741,744],{},[342,735,703],{},[342,737,706],{},[342,739,740],{},"Interzone traffic - producer to broker: 5GB\u002Fs * ⅔ * $0.02\u002FGB(in+out) * 3600 = $240",[342,742,743],{},"Interzone traffic - replication: 10GB\u002Fs * $0.02\u002FGB(in+out) * 3600 = $720",[342,745,746],{},"Interzone traffic - broker to consumer: $0 (fetch from local zone)",[48,748,749,750,755],{},"Please note that we were unable to test ",[55,751,754],{"href":752,"rel":753},"https:\u002F\u002Fwww.redpanda.com\u002Fblog\u002Fcloud-topics-streaming-data-object-storage",[264],"Redpanda with Cloud Topics",", as it remains an announced but unreleased feature and is not yet available for evaluation. Based on the limited information available, while Cloud Topics may help optimize inter-zone data replication costs, producers still need to traverse inter-availability zones to connect to the topic partition owners and incur inter-zone traffic costs of up to $240 per hour.",[339,757,758,764],{},[342,759,760,763],{},[55,761,647],{"href":645,"rel":762},[264]," (when implemented) will help mitigate producer-to-broker inter-zone traffic, but it is not yet available. And it only works with the default partitioner (no record partition or key is specified).",[342,765,766],{},"Redpanda’s leader pinning helps only when all producers for the pinned topic are confined to a single AZ. In multi-AZ environments (like our benchmark), inter-zone producer traffic remains unavoidable.",[48,768,769],{},"Additionally, Redpanda’s Cloud Topics architecture is not documented publicly. Their blog mentions \"leader placement rules to optimize produce latency and ingress cost,\" but it is unclear whether this represents a shift away from a leader-based architecture or if it uses techniques similar to Ursa’s zone-aware approach.",[48,771,772],{},"We may revisit this comparison as more details become available.",[40,774,776],{"id":775},"comparing-total-cost-of-ownership","Comparing Total Cost of Ownership",[48,778,779],{},"As highlighted earlier, with a BYOC Ursa setup, you can achieve 5 GB\u002Fs throughput at just 5% of the infrastructure cost of a traditional leader-based data streaming engine, such as Kafka or RedPanda, while managing the infrastructure yourself. This significant cost reduction is enabled by Ursa’s leaderless architecture and lakehouse-native storage design, which eliminate overhead costs such as inter-zone traffic and leader-based data replication. By leveraging a lakehouse-native, leaderless architecture, Ursa reduces resource requirements, enabling you to handle high data throughput efficiently and at a fraction of the cost of RedPanda.",[48,781,782],{},"Now, let’s examine the total cost comparison, evaluating Ursa alongside other vendors, including those that have adopted a leaderless architecture (e.g., Confluent WarpStream). This comparison is based on a 5GB\u002Fs workload with a 7-day retention period, factoring in both storage cost and vendor costs Here are the key findings:",[339,784,785,788,791],{},[342,786,787],{},"Ursa ($164,353\u002Fmonth) is: 50% cheaper than Confluent WarpStream ($337,068\u002Fmonth)",[342,789,790],{},"85% cheaper than AWS MSK ($1,115,251\u002Fmonth)",[342,792,793],{},"86% cheaper than Redpanda ($1,202,853\u002Fmonth)",[48,795,796],{},"In addition to Ursa’s architectural advantages—eliminating most inter-AZ traffic and leveraging lakehouse storage for cost-effective data retention—it also adopts a more fair and cost-efficient pricing model: Elastic Throughput-based pricing. This approach aligns costs with actual usage, avoiding unnecessary overhead.",[48,798,799],{},"Unlike WarpStream, which charges for both storage and throughput, Ursa ensures that customers only pay for the throughput they actively use. Ursa’s pricing is based on compressed data sent by clients, meaning the more data compressed on the client side, the lower the cost. In contrast, WarpStream prices are based on uncompressed data, unfairly inflating expenses and failing to incentivize customers to optimize their client applications.",[48,801,802],{},"This distinction is crucial, as compressed data reduces both storage and network costs, making Ursa’s pricing model not only more cost-effective but also more transparent and predictable.",[48,804,805],{},[351,806],{"alt":18,"src":807},"\u002Fimgs\u002Fblogs\u002F679c602d194800c9206d9d58_AD_4nXcFlf755xgyz7htxhMhBV5fGrsxy642mQNodt61DTok_z1dwkw5A6lkO5hatXVneCaB0anbZPAyvLI3MlIMuQEYLEACHHvQMOr5UfaB37dfzkdqewDEvcT-20VGd_zzvJsuA00zGA.png",[48,809,810],{},[351,811],{"alt":18,"src":812},"\u002Fimgs\u002Fblogs\u002F679c62594e9c2e629fae73aa_AD_4nXeU6cOgItnjLsEZCOf13TEvMY_SHWWIxYP2OYUj-B1GUPyWO78OG08K_v03hwYSVcg06f9dqDiGmdwy76vynjmiDGL5bluZ5_XF4nSU_r59oOZdfViXndXt6s11vVOY7qwfZN8v.png",[32,814,816],{"id":815},"cost-breakdown","Cost Breakdown",[818,819,820],"h4",{"id":697},"StreamNative – Ursa",[339,822,823,826,829,832,835],{},[342,824,825],{},"EC2 (Server): 9 × $1.536\u002Fhr × 24 hr × 30 days = $9,953.28",[342,827,828],{},"S3 Write Requests: 1,350 r\u002Fs × $0.005\u002F1,000 r × 3,600 s × 24 hr × 30 days = $17,496",[342,830,831],{},"S3 Read Requests: 1,350 r\u002Fs × $0.0004\u002F1,000 r × 3,600 s × 24 hr × 30 days = $1,400",[342,833,834],{},"S3 Storage Costs: 5 GB\u002Fs × $0.021\u002FGB × 3,600 s × 24 hr × 7 days = $63,504",[342,836,837],{},"Vendor Cost: 200 ETU × $0.50\u002Fhr × 24 hr × 30 days = $72,000",[818,839,841],{"id":840},"warpstream","WarpStream",[339,843,844,847],{},[342,845,846],{},"Based on WarpStream’s pricing calculator (as of January 29, 2025), we assume a 4:1 client data compression ratio, meaning 20 GB\u002Fs of uncompressed data translates to 5 GB\u002Fs of compressed data.",[342,848,849,850,855],{},"It's important to note that WarpStream’s pricing structure has fluctuated frequently throughout January. We observed the cost reported by their calculator changing from $409,644 per month to $337,068 per month. This variability has been previously highlighted in the blog post “",[55,851,854],{"href":852,"rel":853},"https:\u002F\u002Fbigdata.2minutestreaming.com\u002Fp\u002Fthe-brutal-truth-about-apache-kafka-cost-calculators",[264],"The Brutal Truth About Kafka Cost Calculators","”. To ensure transparency, we have documented the pricing as of January 29, 2025.",[48,857,858],{},[351,859],{"alt":18,"src":860},"\u002Fimgs\u002Fblogs\u002F679c602e42713e0028e9af5e_AD_4nXcu5_VWTLu9jRYs6zX1MBAOtLQEo5gyfNSWPcbpnQHXTa8qNCFAXezRR2E8daygzYTTwd4dhJjaLaLM8C6y_3OGbu2NS7pdvEv3a8-ptNKOg7AeKnYqPQCAYvQ5EuxzuI3JYIvY.png",[818,862,864],{"id":863},"msk","MSK",[339,866,867,870,873],{},[342,868,869],{},"EC2 (Server): 15 * $3.264\u002Fhr × 24 hr × 30 days = $35,251",[342,871,872],{},"Interzone Traffic (Client-Server): 5 GB\u002Fs × ⅔ × $0.02\u002FGB (in+out) × 3,600 s × 24 hr × 30 days = $172,800",[342,874,875],{},"Storage: 5 GB\u002Fs × $0.1\u002FGB-month × 3,600 s × 24 hr × 7 days * 3 replicas = $907,200",[818,877,731],{"id":878},"redpanda-1",[339,880,881,884,886,889,892],{},[342,882,883],{},"EC2 (Server): 9 × $1.536\u002Fhr × 24 hr × 30 days = $9953",[342,885,872],{},[342,887,888],{},"Interzone Traffic (Replication): 5 GB\u002Fs × 2 × $0.02\u002FGB (in+out) × 3,600 s × 24 hr × 30 days = $518,400",[342,890,891],{},"Storage: 5 GB\u002Fs × $0.045\u002FGB-month(st1) × 3,600 s × 24 hr × 7 days * 3 replicas = $408,240",[342,893,894],{},"Vendor Cost: $93,333 per month (based on limited information. See additional notes below).",[818,896,898],{"id":897},"additional-notes","Additional Notes",[339,900,901],{},[342,902,903,904,909],{},"Redpanda does not publicly disclose its BYOC pricing, making it difficult to accurately assess its total costs. We refer to information from the whitepaper “",[55,905,908],{"href":906,"rel":907},"https:\u002F\u002Fwww.redpanda.com\u002Fresources\u002Fredpanda-vs-confluent-performance-tco-benchmark-report#form",[264],"Redpanda vs. Confluent: A Performance and TCO Benchmark Report by McKnight Consulting Group.","” for estimation purposes. Based on the Tier-8 pricing model in the whitepaper,  the estimated cost to support a 5GB\u002Fs workload would be $1.12 million per year ($93,333 per month). However, since this calculation is based on an estimation, we will revisit and refine the cost assessment once Redpanda publishes its BYOC pricing.",[48,911,912],{},[351,913],{"alt":18,"src":914},"\u002Fimgs\u002Fblogs\u002F679c602dc8a9859eed89a0ef_AD_4nXdbcO8vsNNPy4GtkNLlmNKf22fjxRvzLzH7CtOna1L08sTbvnZx3HhufeFqc1w4K2gEF7lxO2IR5supotxebAiGnA07Qa8Yr3Rd1pVK2LYKK4WurlJGwgdwwucZIFoF-N_2oBjY.png",[48,916,917],{},[351,918],{"alt":18,"src":919},"\u002Fimgs\u002Fblogs\u002F679c602d6bc1c2287e012540_AD_4nXfcHZnLfjbjIr3ZAgoQXT9dwP3aQCOQPmGZZJUtpNZSwE6qY6M3yehIaBxCwxEIeu5PVdUPY0zhyjnow26YfgjdYgSG4GnV9ibxu0YWTIpwng6z_F6FUGJMpERMKtpsFESzXSN_Sw.png",[339,921,922,925],{},[342,923,924],{},"When estimating the storage costs for Kafka and Redpanda, we assume the use of HDD storage at $0.045\u002FGB, based on the premise that both systems can fully utilize disk bandwidth without incurring the higher costs associated with GP2 or GP3 volumes. However, in practice, many users opt for GP2 or GP3, significantly increasing the total storage cost for Kafka and Redpanda.",[342,926,927],{},"Unlike disk-based solutions, S3 storage does not require capacity preallocation—Ursa only incurs costs for the actual data stored. This contrasts with Kafka and Redpanda, where preallocating storage can drive up expenses. As a result, the real-world storage costs for Kafka and Redpanda are often 50% higher than the estimates above.",[40,929,931],{"id":930},"conclusion","Conclusion",[48,933,934],{},"Ursa represents a transformative shift in streaming data infrastructure, offering cost efficiency, scalability, and flexibility without compromising durability or reliability. By leveraging a leaderless architecture and eliminating inter-zone data replication, Ursa reduces total cost of ownership by over 90% compared to traditional leader-based streaming engines like Kafka and Redpanda. Its direct integration with cloud storage and scalable metadata & index management via Oxia ensure high availability and simplified infrastructure management.",[32,936,938],{"id":937},"balancing-latency-and-cost","Balancing Latency and Cost",[48,940,941,945],{},[55,942,944],{"href":943},"\u002Fblog\u002Fcap-theorem-for-data-streaming","Ursa trades off slightly higher latency for ultra low cost",", making it an ideal choice for the majority of streaming workloads, especially those that prioritize throughput and cost savings over ultra-low latency. Meanwhile, StreamNative’s BookKeeper-based engine remains the preferred solution for real-time, latency-sensitive applications. By combining these two approaches, StreamNative empowers customers with the flexibility to choose the right engine for their specific needs—whether it's maximizing cost savings or achieving ultra low-latency real-time performance.",[32,947,949],{"id":948},"the-future-of-streaming-infrastructure","The Future of Streaming Infrastructure",[48,951,952],{},"In an era where data fuels AI, analytics, and real-time decision-making, managing infrastructure costs is critical to sustaining innovation. Ursa is not just a cost-cutting alternative—it is a forward-thinking, lakehouse-native platform that redefines how modern data streaming infrastructure should be built and operated.",[48,954,955,956,961],{},"Whether your priority is reducing costs, improving flexibility, or ingesting massive data into lakehouses, Ursa delivers a future-proof solution for the evolving demands of real-time data streaming. ",[55,957,960],{"href":958,"rel":959},"https:\u002F\u002Fconsole.streamnative.cloud\u002F",[264],"Get started"," with StreamNative Ursa today!",[963,964,966],"h1",{"id":965},"references","References",[48,968,969,972,973],{},[970,971,430],"span",{}," ",[55,974,975],{"href":975},"\u002Fblog\u002Fintroducing-oxia-scalable-metadata-and-coordination",[48,977,978,972,980],{},[970,979,379],{},[55,981,378],{"href":378},[48,983,984,972,987],{},[970,985,986],{},"StreamNative pricing",[55,988,989],{"href":989,"rel":990},"https:\u002F\u002Fdocs.streamnative.io\u002Fdocs\u002Fbilling-overview",[264],[48,992,993,972,996],{},[970,994,995],{},"WarpStream pricing",[55,997,998],{"href":998,"rel":999},"https:\u002F\u002Fwww.warpstream.com\u002Fpricing#pricingfaqs",[264],[48,1001,1002,972,1005],{},[970,1003,1004],{},"AWS S3 pricing",[55,1006,1007],{"href":1007,"rel":1008},"https:\u002F\u002Faws.amazon.com\u002Fs3\u002Fpricing\u002F",[264],[48,1010,1011,972,1014],{},[970,1012,1013],{},"AWS EBS pricing",[55,1015,1016],{"href":1016,"rel":1017},"https:\u002F\u002Faws.amazon.com\u002Febs\u002Fpricing\u002F",[264],[48,1019,1020,972,1023],{},[970,1021,1022],{},"AWS MSK pricing",[55,1024,1025],{"href":1025,"rel":1026},"https:\u002F\u002Faws.amazon.com\u002Fmsk\u002Fpricing\u002F",[264],[48,1028,1029,972,1032],{},[970,1030,1031],{},"The Brutal Truth about Kafka Cost Calculators",[55,1033,852],{"href":852,"rel":1034},[264],[48,1036,1037,972,1040],{},[970,1038,1039],{},"Redpanda vs. Confluent: A Performance and TCO Benchmark Report by McKnight Consulting Group",[55,1041,906],{"href":906,"rel":1042},[264],{"title":18,"searchDepth":19,"depth":19,"links":1044},[1045,1046,1047,1052,1056,1057,1066,1069],{"id":333,"depth":19,"text":334},{"id":372,"depth":19,"text":373},{"id":397,"depth":19,"text":398,"children":1048},[1049,1050,1051],{"id":409,"depth":279,"text":410},{"id":434,"depth":279,"text":435},{"id":455,"depth":279,"text":456},{"id":479,"depth":19,"text":480,"children":1053},[1054,1055],{"id":483,"depth":279,"text":484},{"id":498,"depth":279,"text":499},{"id":539,"depth":19,"text":540},{"id":551,"depth":19,"text":552,"children":1058},[1059,1060,1061,1062,1063,1064,1065],{"id":558,"depth":279,"text":559},{"id":604,"depth":279,"text":605},{"id":622,"depth":279,"text":623},{"id":669,"depth":279,"text":670},{"id":697,"depth":279,"text":698},{"id":715,"depth":279,"text":716},{"id":730,"depth":279,"text":731},{"id":775,"depth":19,"text":776,"children":1067},[1068],{"id":815,"depth":279,"text":816},{"id":930,"depth":19,"text":931,"children":1070},[1071,1072],{"id":937,"depth":279,"text":938},{"id":948,"depth":279,"text":949},"StreamNative Cloud","2025-01-31","Discover how Ursa achieves 5GB\u002Fs Kafka workloads at just 5% of the cost of traditional streaming engines like Redpanda and AWS MSK. See our benchmark results comparing infrastructure costs, total cost of ownership (TCO), and performance across leading Kafka vendors.","\u002Fimgs\u002Fblogs\u002F679c6593d25099b1cdcec4ca_image-31.png",{},"\u002Fblog\u002Fhow-we-run-a-5-gb-s-kafka-workload-for-just-50-per-hour","30 min",{"title":308,"description":1075},"blog\u002Fhow-we-run-a-5-gb-s-kafka-workload-for-just-50-per-hour",[1083,1084,303],"TCO","Apache Kafka","CDUawvFKTs_AD8usvmIcTleU3mbfA0QAoPZM6xfVuo8",{"id":1087,"title":1088,"authors":1089,"body":1093,"canonicalUrl":289,"category":290,"createdAt":289,"date":6,"description":1606,"extension":8,"featured":7,"image":1607,"isDraft":294,"link":289,"meta":1608,"navigation":7,"order":296,"path":9,"readingTime":289,"relatedResources":289,"seo":1609,"stem":1610,"tags":1611,"__hash__":1615},"blogs\u002Fblog\u002Ffrom-streams-to-lakestreams.md","From Streams to Lakestreams: The Next Paradigm in Data Infrastructure",[1090,1091,28,1092,313,312,311],"Sijie Guo","Matteo Merli","Kundan Vyas",{"type":15,"value":1094,"toc":1593},[1095,1098,1105,1112,1115,1125,1129,1132,1135,1146,1151,1156,1163,1166,1170,1177,1180,1183,1194,1197,1206,1209,1212,1216,1219,1229,1232,1235,1240,1243,1254,1257,1262,1267,1271,1274,1285,1288,1291,1294,1305,1308,1311,1319,1323,1326,1329,1332,1335,1338,1341,1346,1349,1352,1355,1358,1362,1365,1368,1374,1377,1385,1389,1392,1395,1402,1405,1409,1412,1417,1422,1427,1430,1441,1448,1451,1456,1459,1462,1473,1478,1481,1484,1487,1493,1496,1499,1502,1506,1509,1515,1521,1527,1533,1539,1542,1547,1552,1556,1559,1570,1573,1576,1579,1582,1587,1590],[48,1096,1097],{},"When we founded StreamNative, we set out to build the world's best data\nstreaming platform. We succeeded --- but the more interesting discovery\ncame from what we found along the way: streaming, on its own, is only\npart of the story.",[48,1099,1100,1101,1104],{},"StreamNative was founded by the original creators of ",[44,1102,1103],{},"Apache Pulsar",",\na system born inside Yahoo to handle unified messaging and real-time\ndata movement at a scale few organizations ever face. When we\nopen-sourced that work and brought it to the broader market, we believed\nthe core problem was speed and scale. We were right --- but we were\nasking a narrower question than the industry actually needed us to\nanswer.",[48,1106,1107,1108,1111],{},"What enterprises kept running into wasn't a streaming problem in\nisolation. It was an ",[36,1109,1110],{},"integration"," problem: how does real-time data\ncoexist with the analytical systems, storage layers, and AI workloads\nthat define the modern data stack? The answer, we came to realize,\npoints toward something bigger than streaming --- a unified lakehouse\narchitecture where real-time and historical data aren't siloed, but\ngenuinely converge.",[48,1113,1114],{},"This post is about that journey: from building a streaming company, to\nrecognizing a fundamental architectural shift --- and why we believe it\nmatters for every organization serious about making data a competitive\nadvantage.",[48,1116,1117,1118,190],{},"We call it\n",[55,1119,1122],{"href":1120,"rel":1121},"https:\u002F\u002Fstreamnative.io\u002Flakestream",[264],[44,1123,1124],{},"Lakestream",[40,1126,1128],{"id":1127},"starting-with-pulsar-and-learning-the-limits-of-protocols","Starting with Pulsar -- and Learning the Limits of Protocols",[48,1130,1131],{},"When we built Apache Pulsar at Yahoo, we were solving a problem that\ndidn't have a good answer yet: how do you build a multi-tenant, unified\nmessaging and streaming platform capable of handling millions of topics,\nbillions of messages, and the operational complexity of a global\ninternet company --- all on shared infrastructure?",[48,1133,1134],{},"Some of the architectural decisions we made turned out to be more\nconsequential than we realized at the time.",[48,1136,1137,1138,1141,1142,1145],{},"The first was ",[44,1139,1140],{},"compute-storage separation",". While other streaming\nsystems tightly coupled brokers to local disks, we built Pulsar around\n",[44,1143,1144],{},"Apache BookKeeper"," as an independent, distributed storage layer.\nBrokers were stateless. Storage was durable and decoupled. In 2012, this\nwas an unconventional bet. Today, it's a foundational design pattern\nacross virtually every large-scale data system --- from cloud data\nwarehouses to modern streaming platforms. We didn't predict the future;\nwe just kept following the engineering logic until it led somewhere\ninteresting.",[48,1147,1148],{},[351,1149],{"alt":18,"src":1150},"\u002Fimgs\u002Fblogs\u002Ffrom-streams-to-lakestreams-image3.png",[48,1152,1153],{},[36,1154,1155],{},"Figure 1. From Monolith to Compute\u002FStorage Separation",[48,1157,1158,1159,1162],{},"The second was ",[44,1160,1161],{},"messaging semantics",". We embedded rich subscription\nmodels --- including shared subscriptions --- directly into the client\nprotocol from the beginning, because real enterprise workloads demanded\nthem. Kafka added shared subscriptions more than a decade later. We're\nnot pointing this out to score points; we're pointing it out because it\nreflects something important about what happens when you design for the\nfull complexity of enterprise use cases from day one, rather than\noptimizing narrowly and retrofitting later.",[48,1164,1165],{},"Getting these things right taught us something, too: good architecture\ncreates options. And the options Pulsar's design left open would matter\nmore than we initially expected.",[40,1167,1169],{"id":1168},"from-managed-service-to-market-reality","From Managed Service to Market Reality",[48,1171,1172,1173,1176],{},"When we founded StreamNative, we brought Pulsar to the cloud as a\n",[44,1174,1175],{},"fully managed service",". Enterprises adopted it quickly for their most\ndemanding workloads --- financial transaction processing, IoT telemetry\nat scale, real-time fraud detection. Pulsar's multi-tenancy,\ngeo-replication, and unified messaging model made it a natural fit for\nuse cases where reliability and operational isolation aren't optional.",[48,1178,1179],{},"But as our customer base grew, a pattern emerged --- and it was\nremarkably consistent.",[48,1181,1182],{},"Most enterprises were running a split world. They used StreamNative for\nmission-critical messaging and queuing --- the workloads where\ntransactional guarantees and tenant isolation matter most. But their\ndata streaming and ingestion pipelines ran on Kafka. Not because Kafka\nwas architecturally superior for those use cases, but because the Kafka\nprotocol had become the industry's lingua franca. Every connector,\nevery SaaS tool, every cloud service spoke Kafka. Switching meant\nrewriting integrations, not just swapping infrastructure.",[48,1184,1185,1186,1189,1190,1193],{},"So we made a pragmatic call: we made StreamNative Kafka-compatible.\nFirst came ",[44,1187,1188],{},"KoP (Kafka-on-Pulsar)",", an open-source protocol handler\nletting Pulsar brokers speak the Kafka wire protocol natively. Then came\n",[44,1191,1192],{},"KSN (Kafka-on-StreamNative)"," --- a more deeply integrated,\nproduction-hardened compatibility layer built for enterprise scale.",[48,1195,1196],{},"This worked. Customers could consolidate their Kafka workloads onto our\nplatform without changing application code. But it taught us our first\ncrucial lesson:",[1198,1199,1200],"blockquote",{},[48,1201,1202,1205],{},[44,1203,1204],{},"Lesson 1:"," Protocol compatibility is table stakes, not a\ndifferentiator.",[48,1207,1208],{},"By the time we shipped KSN, every major streaming vendor was racing\ntoward Kafka compatibility --- including Confluent itself, which\nacquired WarpStream, a Kafka-compatible alternative. The protocol\nwasn't a moat anymore; it was becoming commodity infrastructure. If we\nwere going to build something with lasting value, it had to live\nsomewhere deeper than the wire protocol.",[48,1210,1211],{},"That realization forced a harder question: if the protocol layer was\nbeing commoditized, what actually mattered?",[40,1213,1215],{"id":1214},"rethinking-storage-and-the-moment-everything-clicked","Rethinking Storage -- and the Moment Everything Clicked",[48,1217,1218],{},"By 2023, we were operating large-scale streaming clusters across three\nmajor cloud providers, and we were seeing the same problem everywhere\n--- regardless of industry, workload, or team size.",[48,1220,1221,1228],{},[44,1222,1223,190],{},[55,1224,1227],{"href":1225,"rel":1226},"https:\u002F\u002Fstreamnative.io\u002Fblog\u002Fa-guide-to-evaluating-the-infrastructure-costs-of-apache-pulsar-and-apache-kafka",[264],"Cross-AZ replication costs were eating 60-90% of infrastructure\nbudgets","\nThe root cause was structural. Kafka's leader-per-partition\narchitecture --- and most streaming systems like it --- requires every\nwrite to be replicated from a leader broker to follower brokers,\ntypically across availability zones. In on-premises environments,\nthat's an architectural inconvenience. In the cloud, where cross-AZ\ndata transfer is metered, it becomes the single largest line item on the\nbill. We watched operational teams spending more cycles managing\ninfrastructure costs than shipping products. Something was fundamentally\nbroken about the economic model.",[48,1230,1231],{},"The rest of the industry was attacking this at the broker layer. Some\nwere rewriting streaming engines in C++ for raw throughput gains. Others\nwere offloading cold log segments to S3 as tiered storage, or adding\nobject storage as a secondary backend. These were legitimate engineering\nefforts --- but they were addressing symptoms rather than the underlying\ncause. They were making an expensive architecture incrementally more\nefficient, without stepping back to ask whether the architecture itself\nneeded rethinking.",[48,1233,1234],{},"We asked a different question.",[48,1236,1237],{},[44,1238,1239],{},"\"What if streaming data didn't need its own storage format at all?\nWhat if it could live natively in the lakehouse?\"",[48,1241,1242],{},"This question changed everything.",[48,1244,1245,1246,1249,1250,1253],{},"We built a new storage foundation from the ground up -- ",[44,1247,1248],{},"leaderless,\ndiskless, writing directly to object storage in open lakehouse\nformats",". Instead of Kafka's local log segments or Pulsar's\nBookKeeper ledgers, data went straight to S3, GCS, or Azure Blob Storage\nas Parquet files in Apache Iceberg or Delta Lake format. A distributed\nwrite-ahead log (WAL) handled the low-latency append path, ensuring\nproducers got sub-second acknowledgments. But the durable, queryable\ndata wasn't waiting to be exported or transformed into a lakehouse\nformat downstream --- it ",[36,1251,1252],{},"was"," a lakehouse table from the moment it was\ncommitted.",[48,1255,1256],{},"This wasn't tiered storage. It wasn't an export pipeline. It was a\nfundamental reconception of where streaming data lives --- and what it\ncan do from the moment it arrives.",[48,1258,1259],{},[351,1260],{"alt":18,"src":1261},"\u002Fimgs\u002Fblogs\u002Ffrom-streams-to-lakestreams-image4.png",[48,1263,1264],{},[36,1265,1266],{},"Figure 2. Ursa Architecture",[40,1268,1270],{"id":1269},"the-results-and-the-surprise-they-revealed","The Results --- and the Surprise They Revealed",[48,1272,1273],{},"The architectural payoff was immediate and measurable. No proprietary\nsegment format. No inter-broker replication. No cross-AZ data transfer\nfor durability --- that responsibility shifted to the object store\nitself, which delivers eleven-nines durability at a fraction of the cost\nof traditional streaming infrastructure.",[48,1275,1276,1277,1280,1281,1284],{},"The economics were stark: ",[44,1278,1279],{},"up to 95% cost reduction at 5 GB\u002Fs sustained\nthroughput",". We published the benchmark and the architecture openly.\nThe work was subsequently recognized with a ",[44,1282,1283],{},"Best Industry Paper award\nat VLDB 2025"," --- one of the most respected academic venues in data\nmanagement --- selected over submissions from Databricks, Meta, and\nAlibaba. We share this not to collect trophies, but because independent\nvalidation from the research community matters when you're asking the\nindustry to rethink a foundational assumption.",[48,1286,1287],{},"But the cost savings, as dramatic as they were, turned out to be the\nleast interesting part of what we'd built.",[48,1289,1290],{},"As we deployed this new storage layer with customers, something\nunexpected kept happening. Engineers would produce data to a Kafka topic\n--- same client code, same producers, same workflows they'd always\nused. Then they'd open Spark, Snowflake, or Databricks, and discover\nthey could query that exact data. No Kafka Connect. No sink connector.\nNo materialization pipeline. No batch ETL window to wait out.",[48,1292,1293],{},"The data was already there. Already in Iceberg format. Already a table.",[48,1295,1296],{},[44,1297,1298,1301,1302],{},[36,1299,1300],{},"\"Wait,\""," they'd say. ",[36,1303,1304],{},"\"My Kafka topic IS a table?\"",[48,1306,1307],{},"Yes. That's exactly what it is.",[48,1309,1310],{},"That moment --- repeated across customer after customer --- is when we\nunderstood what we'd actually built. Not a cheaper streaming engine.\nNot a better Kafka. Something that dissolved the boundary between\nstreaming infrastructure and the lakehouse entirely. The storage layer\nwasn't a cost optimization with a happy side effect. It was a\nunification --- one that made a decades-old architectural divide simply\ndisappear.",[1198,1312,1313],{},[48,1314,1315,1318],{},[44,1316,1317],{},"Lesson 2:"," The real breakthrough wasn't cost savings. It was\ndiscovering that streaming data and lakehouse data can be the same\nthing -- and that the bridge between them isn't a connector. It's\nthe storage layer itself.",[40,1320,1322],{"id":1321},"the-convergence-nobody-planned-and-what-the-industry-got-half-right","The Convergence Nobody Planned -- and What the Industry Got Half-Right",[48,1324,1325],{},"We weren't alone in recognizing the gap between streaming and the\nlakehouse. By the time we were deep in this problem, the entire industry\nwas trying to close it --- just from different directions, with\ndifferent foundational assumptions.",[48,1327,1328],{},"One camp built bridges. Confluent's Tableflow, Kafka Connect with\nIceberg sinks, and similar approaches treat streaming and the lakehouse\nas separate systems and materialize data between them. These solutions\nwork --- but they work by adding complexity: another pipeline to manage,\nanother failure mode to monitor, and an irreducible latency between when\ndata is produced and when it's queryable downstream.",[48,1330,1331],{},"Another camp added streaming capabilities to lakehouse platforms.\nDatabricks has Spark Structured Streaming and Delta Live Tables.\nSnowflake has Snowpipe Streaming and Dynamic Tables. These are genuinely\npowerful tools for analytics teams --- but streaming as an analytics\nfeature is categorically different from streaming as infrastructure. You\ncannot build a mission-critical messaging system, a financial\ntransaction backbone, or a real-time fraud detection pipeline on Spark\nStructured Streaming. The operational guarantees simply aren't there.",[48,1333,1334],{},"A third camp attempted something more ambitious: entirely new unified\narchitectures. Ververica's Streamhouse concept --- combining Apache\nFlink with Apache Paimon --- is a serious and thoughtful approach. But\nit asks organizations to adopt a new ecosystem wholesale, which means\nleaving Kafka compatibility, existing tooling, and years of operational\ninvestment behind.",[48,1336,1337],{},"Each approach solves part of the problem. None of them questions the\nassumption underneath it.",[48,1339,1340],{},"They all treat streaming and the lakehouse as fundamentally separate\nsystems that need to be connected --- and compete on how elegantly they\nbuild that connection.",[48,1342,1343],{},[44,1344,1345],{},"What if that assumption is wrong?",[48,1347,1348],{},"The historical parallel is hard to ignore. In 2020, Databricks\nintroduced the lakehouse concept with a deceptively simple insight: data\nwarehouses and data lakes didn't need to be separate systems connected\nby ETL pipelines. You could implement warehouse-grade capabilities ---\nACID transactions, schema enforcement, fine-grained governance ---\ndirectly on top of cheap, open-format lake storage. The lakehouse\ndidn't build a better bridge. It made the bridge unnecessary.",[48,1350,1351],{},"The streaming industry is standing at the same inflection point.",[48,1353,1354],{},"For years, we've accepted that real-time event streaming and analytical\ndata infrastructure are different systems with different storage\nformats, different operational models, and different teams responsible\nfor keeping them in sync. The connector ecosystem exists to paper over\nthat divide. But connectors are a symptom, not a solution --- evidence\nof an architectural boundary that perhaps shouldn't exist in the first\nplace.",[48,1356,1357],{},"We weren't setting out to write a manifesto. We were trying to fix\ninfrastructure costs. But the storage architecture we built kept\npointing toward the same conclusion: streaming and the lakehouse don't\nneed to be separate either.",[40,1359,1361],{"id":1360},"naming-what-we-built-lakestream","Naming What We Built: Lakestream",[48,1363,1364],{},"Looking back, the through-line was always there. Pulsar's\ncompute-storage separation. Kafka protocol compatibility.\nLakehouse-native storage. Each felt like a distinct product decision at\nthe time. In retrospect, they were pieces of the same architectural\nargument --- we just didn't have a name for it yet.",[48,1366,1367],{},"Now we do.",[48,1369,1370,1371,1373],{},"We call it ",[44,1372,1124],{}," -- a lakehouse-native streaming architecture\nthat treats streams as first-class lakehouse primitives alongside\ntables. Not a streaming system that exports to the lakehouse. Not a\nlakehouse that ingests from streams. A unified foundation where the\ndistinction stops being meaningful.",[48,1375,1376],{},"Just as the lakehouse dissolved the boundary between data warehouses and\ndata lakes, Lakestream dissolves the boundary between data streaming and\nthe lakehouse. Not by building better bridges. By making the bridge\nunnecessary.",[1198,1378,1379],{},[48,1380,1381,1384],{},[44,1382,1383],{},"Key Principle:"," Lakestream is NOT a replacement for the lakehouse.\nIt is an extension that augments the lakehouse with real-time\nstreaming capabilities -- adding streams as first-class primitives\nalongside tables.",[32,1386,1388],{"id":1387},"the-core-insight-push-interoperability-down-the-stack","The Core Insight: Push Interoperability Down the Stack",[48,1390,1391],{},"Most streaming systems today solve interoperability at the protocol\nlayer --- translating between Kafka, Pulsar, MQTT, and other protocols\nat the top of the stack. It's a reasonable approach, and it's where\nmost of the industry's engineering energy has gone. But it has a\nstructural consequence: every protocol becomes its own data silo, and\nconnecting them requires maintaining point-to-point translation at the\napplication layer indefinitely.",[48,1393,1394],{},"Lakestream takes a different approach. Rather than pushing\ninteroperability up to the protocol, we push it down --- to the storage\nand catalog layers, where it can be solved once and inherited by\neverything above it.",[48,1396,1397,1398,1401],{},"The result is an architectural property that's easy to state but hard\nto overstate: ",[44,1399,1400],{},"the protocol becomes a choice of interface, not a choice\nof data silo."," Write via Kafka. Consume via Pulsar. Query via SQL.\nSubscribe via MQTT. The data underneath is identical --- the same\nIceberg tables, the same catalog entries, the same durable objects in\nyour object store.",[48,1403,1404],{},"This is what makes Lakestream structurally different from compatibility\nlayers and connector ecosystems. Those approaches translate between\nsilos. Lakestream eliminates the silo.",[32,1406,1408],{"id":1407},"the-architecture","The Architecture",[48,1410,1411],{},"Lakestream is built on three layers -- each one a direct consequence of\nsomething we learned along the way:",[48,1413,1414],{},[351,1415],{"alt":18,"src":1416},"\u002Fimgs\u002Fblogs\u002Ffrom-streams-to-lakestreams-image2.png",[48,1418,1419],{},[36,1420,1421],{},"Figure 3. Lakestream Architecture",[48,1423,1424],{},[44,1425,1426],{},"1. The Data Layer: Cloud-Native Stream Storage",[48,1428,1429],{},"The foundation is lakehouse-native stream storage, and it's where the\neconomics of Lakestream begin.",[48,1431,1432,1433,1436,1437,1440],{},"A distributed write-ahead log handles real-time ingestion with\nsub-second producer acknowledgments. But rather than writing to\nproprietary broker-local storage, data is durably committed to object\nstorage --- S3, GCS, or Azure Blob --- as Parquet files organized as\n",[44,1434,1435],{},"Apache Iceberg"," or ",[44,1438,1439],{},"Delta Lake"," tables. The architecture is\nleaderless and diskless: any broker can serve any partition, and no\nlocal disks are required for durability.",[48,1442,1443,1444,1447],{},"The consequences are significant. Cross-AZ replication costs disappear\n--- durability is delegated to the object store, which provides\neleven-nines reliability at commodity pricing. The result is ",[44,1445,1446],{},"up to 95%\nlower infrastructure cost"," compared to traditional streaming\ndeployments at equivalent throughput.",[48,1449,1450],{},"But the deeper consequence isn't the cost. It's that streaming data is\nlakehouse data from the moment it's written --- no transformation, no\nexport, no pipeline in between.",[48,1452,1453],{},[44,1454,1455],{},"2. Metadata Layer: The Lakestream Catalog",[48,1457,1458],{},"If the data layer unifies how streams are stored, the catalog layer\nunifies how they're understood.",[48,1460,1461],{},"The Lakestream Catalog provides a single metadata plane for both streams\nand tables, organized around a three-level namespace ---\ncatalog.namespace.stream --- that will feel immediately familiar to\nanyone working in modern data platforms. Every stream has a\ncorresponding lakehouse table, and the catalog maintains that linkage\nautomatically. Producers don't need to think about it. Consumers don't\nneed to configure it. It's just there.",[48,1463,1464,1465,1468,1469,1472],{},"Critically, the Lakestream Catalog federates with the catalogs\norganizations already use --- ",[44,1466,1467],{},"Databricks Unity Catalog",", ",[44,1470,1471],{},"Snowflake\nHorizon Catalog",", and others. This means streams become discoverable in\nthe same metadata layer as batch tables, governed by the same policies,\nand visible to the same tools. Streaming data stops being invisible\ninfrastructure and starts being a first-class asset in your data\nplatform.",[48,1474,1475],{},[44,1476,1477],{},"3. Protocol Layer: Stateless Protocol Servers",[48,1479,1480],{},"The protocol layer is where Lakestream meets the world as it actually\nexists --- and where one of our hardest-learned lessons shaped the\ndesign most directly.",[48,1482,1483],{},"We've lived through what happens when an industry consolidates around a\nsingle protocol. Kafka's dominance brought enormous ecosystem benefits\n--- ubiquitous tooling, broad cloud integration, a generation of\nengineers who know it deeply. But it also meant the industry inherited\nKafka's limitations as fixed constraints. Messaging semantics that\nenterprises needed --- shared subscriptions, exclusive consumers,\nfailure queues --- simply weren't there. We built those capabilities\ninto Pulsar over a decade ago because real-world workloads demanded\nthem. Kafka added shared subscriptions years later, after the absence\nhad already forced countless teams into workarounds.",[48,1485,1486],{},"The lesson isn't that Kafka is wrong. It's that betting your entire\ndata architecture on a single protocol's roadmap is a structural risk\n--- one that compounds over time as your use cases grow beyond what that\nprotocol was originally designed to handle.",[48,1488,1489,1490],{},"Lakestream is built on the opposite principle: ",[44,1491,1492],{},"no single protocol owns\nthe architecture.",[48,1494,1495],{},"Kafka, Pulsar, REST, gRPC --- all implemented as stateless protocol\nservers writing to the same underlying storage layer. Your existing\nproducers and consumers work without code changes. Your existing\nconnectors and tooling work without reconfiguration. Adding support for\na new protocol means deploying a new stateless server --- not migrating\ndata, not redesigning pipelines, not waiting years for a standards\ncommittee to catch up to your use case.",[48,1497,1498],{},"This modularity is only possible because interoperability lives at the\nstorage layer, not the protocol layer. When the data underneath is\nprotocol-agnostic, the protocol above it becomes a genuine choice ---\nnot a lock-in decision made once and lived with indefinitely.",[48,1500,1501],{},"The protocol is an interface. The data belongs to everyone who needs it.\nAnd when the next protocol matters --- because it will --- you add it\nwithout touching anything underneath.",[40,1503,1505],{"id":1504},"what-this-changes","What This Changes",[48,1507,1508],{},"Lakestream isn't a product feature. It's an architectural shift ---\nand like most genuine architectural shifts, its implications extend well\nbeyond the layer where the change actually happens.",[48,1510,1511,1514],{},[44,1512,1513],{},"Stream-Table Duality"," is the most immediate consequence, and the one\nthat consistently surprises people when they see it for the first time.\nEvery stream is simultaneously a table. Produce to a Kafka topic and\nquery it from Spark, Snowflake, or Databricks --- not after a connector\nruns, not after a batch job completes, but immediately, because the data\nwas never anywhere else. The pipeline between streaming and analytics\ndoesn't get faster. It ceases to exist.",[48,1516,1517,1520],{},[44,1518,1519],{},"Governed Self-Service Streaming"," follows naturally from the catalog\nlayer. When streams live in the same metadata plane as batch tables,\nthey inherit the same access controls, the same audit trails, and the\nsame schema governance --- automatically. Data teams stop managing\nstreaming infrastructure as a separate operational concern and start\ntreating streams as first-class assets in the same platform they already\ngovern. This is what makes streaming accessible to the broader\norganization, not just the engineers who built the pipelines.",[48,1522,1523,1526],{},[44,1524,1525],{},"Multi-Protocol, Single Data"," means the protocol fragmentation that\nhas quietly balkanized data organizations for years simply stops. Write\nvia Kafka. Consume via Pulsar. Query via SQL. Subscribe via MQTT. The\ndata underneath is identical. Teams can use the interface that fits\ntheir workload rather than the one that fits the infrastructure they\ninherited.",[48,1528,1529,1532],{},[44,1530,1531],{},"Universal Linking"," replaces the brittle point-to-point connector\ntopology that most organizations have quietly accumulated over years of\ngrowth. Replicating data across clusters, regions, or systems happens\nthrough the shared storage and catalog layer --- not through a web of\nconnectors, each one a potential failure mode and a maintenance burden.\nThe architecture gets simpler as it scales, rather than more fragile.",[48,1534,1535,1538],{},[44,1536,1537],{},"Freedom to Evolve"," may be the most important long-term consequence,\nand the hardest to appreciate until you've been burned by protocol\nlock-in. By decoupling the protocol from the storage layer, Lakestream\nmakes the storage layer independently improvable. New compression\nschemes, new indexing strategies, new query optimizations --- none of\nthese require protocol changes, client updates, or application\nmigrations. The architecture can absorb innovation without disruption,\nwhich is a property that compounds in value over time.",[48,1540,1541],{},"Taken together, these aren't five separate benefits. They're five\nexpressions of the same underlying idea: when streaming and the\nlakehouse share a foundation, the constraints that have defined the\nstreaming category for a decade stop being constraints.",[48,1543,1544],{},[351,1545],{"alt":18,"src":1546},"\u002Fimgs\u002Fblogs\u002Ffrom-streams-to-lakestreams-image1.png",[48,1548,1549],{},[36,1550,1551],{},"Figure 4: Streaming Architecture Evolution: From Monolith to\nLakestream",[40,1553,1555],{"id":1554},"the-road-ahead","The Road Ahead",[48,1557,1558],{},"This week, we're moving from architecture to practice.",[48,1560,1561,1562,1565,1566,190],{},"We're launching ",[44,1563,1564],{},"Ursa for Kafka (UFK)"," --- a native Kafka service\nbuilt on the Lakestream foundation. And when we say native, we mean it\nprecisely: not Kafka-compatible, not a translation layer, but Apache\nKafka itself running on Lakestream's lakehouse-native stream storage.\nAny Kafka workload becomes lakehouse-native with zero code changes. No\nmigration. No reconfiguration. No compromise on the Kafka semantics your\napplications already depend on. We'll cover the full details in our ",[55,1567,1569],{"href":1568},"\u002Fblog\u002Fursa-for-kafka-native-apache-kafka-service-on-lakestream","companion post",[48,1571,1572],{},"We're also committed to open-sourcing Ursa and the core Lakestream\ncomponents in the coming months. We've thought carefully about this,\nand our conviction is straightforward: an architectural shift of this\nmagnitude belongs to the community, not to any single vendor. The\nlakehouse succeeded in part because its foundations --- Parquet,\nIceberg, Delta Lake --- were open and composable. We intend to build\nLakestream the same way.",[48,1574,1575],{},"Seven years ago, we thought we were building a better streaming\nplatform. And in the narrow sense, we were. But looking back at the full\narc --- Pulsar's compute-storage separation, the Kafka compatibility\nwork, the lakehouse-native storage breakthrough --- it's clear that\neach step wasn't a detour. It was the path.",[48,1577,1578],{},"We weren't building a better version of what already existed. We were\nworking, iteratively and sometimes without knowing it, toward something\nthe industry didn't have a name for yet.",[48,1580,1581],{},"Now it does.",[48,1583,1584],{},[44,1585,1586],{},"Lakestream.",[48,1588,1589],{},"If you're rethinking your streaming architecture --- or questioning\nassumptions you've held for years about where streaming ends and the\nlakehouse begins --- we'd like to think through it with you. The shift\nwe're describing isn't something one company builds alone. It's\nsomething the industry figures out together.",[48,1591,1592],{},"Let's get started!",{"title":18,"searchDepth":19,"depth":19,"links":1594},[1595,1596,1597,1598,1599,1600,1604,1605],{"id":1127,"depth":19,"text":1128},{"id":1168,"depth":19,"text":1169},{"id":1214,"depth":19,"text":1215},{"id":1269,"depth":19,"text":1270},{"id":1321,"depth":19,"text":1322},{"id":1360,"depth":19,"text":1361,"children":1601},[1602,1603],{"id":1387,"depth":279,"text":1388},{"id":1407,"depth":279,"text":1408},{"id":1504,"depth":19,"text":1505},{"id":1554,"depth":19,"text":1555},"How StreamNative's journey from Pulsar to lakehouse-native streaming crystallized into Lakestream — a new architecture where streams become first-class lakehouse primitives.","\u002Fimgs\u002Fblogs\u002Fblog-thumbnail-from-streams-to-lakestreams.png",{},{"title":1088,"description":1606},"blog\u002Ffrom-streams-to-lakestreams",[1084,1612,1613,1614,379],"Lakehouse","Iceberg","Thought Leadership","PnnPiDDqkat7LX1rM9sEtybR3T7tehuW2FLz5Gea1iQ",[1617,1633,1648,1664,1678,1694,1708],{"id":1618,"title":1090,"bioSummary":1619,"email":289,"extension":8,"image":1620,"linkedinUrl":1621,"meta":1622,"position":1629,"stem":1630,"twitterUrl":1631,"__hash__":1632},"authors\u002Fauthors\u002Fsijie-guo.md","Sijie’s journey with Apache Pulsar began at Yahoo! where he was part of the team working to develop a global messaging platform for the company. He then went to Twitter, where he led the messaging infrastructure group and co-created DistributedLog and Twitter EventBus. In 2017, he co-founded Streamlio, which was acquired by Splunk, and in 2019 he founded StreamNative. He is one of the original creators of Apache Pulsar and Apache BookKeeper, and remains VP of Apache BookKeeper and PMC Member of Apache Pulsar. Sijie lives in the San Francisco Bay Area of California.","\u002Fimgs\u002Fauthors\u002Fsijie-guo.webp","https:\u002F\u002Fwww.linkedin.com\u002Fin\u002Fsijieg\u002F",{"body":1623},{"type":15,"value":1624,"toc":1627},[1625],[48,1626,1619],{},{"title":18,"searchDepth":19,"depth":19,"links":1628},[],"CEO and Co-Founder, StreamNative, Apache Pulsar PMC Member","authors\u002Fsijie-guo","https:\u002F\u002Ftwitter.com\u002Fsijieg","krzMgsbADqGZT1TnpWTVzT4HJ9U7oZB9hzOMiDT5Wd0",{"id":1634,"title":1091,"bioSummary":1635,"email":289,"extension":8,"image":1636,"linkedinUrl":1637,"meta":1638,"position":1645,"stem":1646,"twitterUrl":289,"__hash__":1647},"authors\u002Fauthors\u002Fmatteo-merli.md","Matteo is the CTO at StreamNative, where he brings rich experience in distributed pub-sub messaging platforms. Matteo was one of the co-creators of Apache Pulsar during his time at Yahoo!. Matteo worked to create a global, distributed messaging system for Yahoo!, which would later become Apache Pulsar. Matteo is the PMC Chair of Apache Pulsar, where he helps to guide the community and ensure the success of the Pulsar project. He is also a PMC member for Apache BookKeeper. Matteo lives in Menlo Park, California.","\u002Fimgs\u002Fauthors\u002Fmatteo-merli.webp","https:\u002F\u002Fwww.linkedin.com\u002Fin\u002Fmatteomerli\u002F",{"body":1639},{"type":15,"value":1640,"toc":1643},[1641],[48,1642,1635],{},{"title":18,"searchDepth":19,"depth":19,"links":1644},[],"CTO, StreamNative & Co-Creator and PMC Chair Apache Pulsar","authors\u002Fmatteo-merli","MRLEjDgpe8SqHBoftSh_eiNGg-1oCJ30t7iV3Bb2NzQ",{"id":1649,"title":28,"bioSummary":1650,"email":289,"extension":8,"image":1651,"linkedinUrl":1652,"meta":1653,"position":1661,"stem":1662,"twitterUrl":289,"__hash__":1663},"authors\u002Fauthors\u002Fdavid-kjerrumgaard.md","David is a Principal Sales Engineer and former Developer Advocate for StreamNative. He has over 15 years of experience working with open source projects in the Big Data, Stream Processing, and Distributed Computing spaces. David is the author of Pulsar in Action.","\u002Fimgs\u002Fauthors\u002Fdavid-kjerrumgaard.webp","https:\u002F\u002Fwww.linkedin.com\u002Fin\u002Fdavidkj\u002F",{"body":1654},{"type":15,"value":1655,"toc":1659},[1656],[48,1657,1658],{},"David is a Sales Engineer and former Developer Advocate for StreamNative with a focus on helping developers solve their streaming data challenges using Apache Pulsar. He has over 15 years of experience working with open source projects in the Big Data, Stream Processing, and Distributed Computing spaces. David is the author of the book Pulsar in Action.",{"title":18,"searchDepth":19,"depth":19,"links":1660},[],"Principal Sales Engineer, StreamNative","authors\u002Fdavid-kjerrumgaard","-X5RI2tEofWI91uNkN4IduxJbMIoSTqxTinSYCBJcUw",{"id":1665,"title":1092,"bioSummary":1666,"email":289,"extension":8,"image":1667,"linkedinUrl":289,"meta":1668,"position":1675,"stem":1676,"twitterUrl":289,"__hash__":1677},"authors\u002Fauthors\u002Fkundan-vyas.md","Kundan is a Staff Product Manager at StreamNative, where he spearheads StreamNative Cloud, Lakehouse Storage and compute platform for connectivity, functions, and stream processing. Kundan also leads Partner Strategy at StreamNative, focusing on building strong, mutually beneficial relationships that enhance the company's offerings and reach.","\u002Fimgs\u002Fauthors\u002Fkundan-vyas.jpeg",{"body":1669},{"type":15,"value":1670,"toc":1673},[1671],[48,1672,1666],{},{"title":18,"searchDepth":19,"depth":19,"links":1674},[],"Staff Product Manager, StreamNative","authors\u002Fkundan-vyas","VP1uQZhRLPu59TJ5QDGtTyqngH5s9ui-f-nQNrRbqz0",{"id":1679,"title":313,"bioSummary":1680,"email":289,"extension":8,"image":1681,"linkedinUrl":1682,"meta":1683,"position":1690,"stem":1691,"twitterUrl":1692,"__hash__":1693},"authors\u002Fauthors\u002Fpenghui-li.md","Penghui Li is passionate about helping organizations to architect and implement messaging services. Prior to StreamNative, Penghui was a Software Engineer at Zhaopin.com, where he was the leading Pulsar advocate and helped the company adopt and implement the technology. He is an Apache Pulsar Committer and PMC member.","\u002Fimgs\u002Fauthors\u002Fpenghui-li.webp","https:\u002F\u002Fwww.linkedin.com\u002Fin\u002Fpenghui-li-244173184\u002F",{"body":1684},{"type":15,"value":1685,"toc":1688},[1686],[48,1687,1680],{},{"title":18,"searchDepth":19,"depth":19,"links":1689},[],"Director of Streaming, StreamNative & Apache Pulsar PMC Member","authors\u002Fpenghui-li","https:\u002F\u002Ftwitter.com\u002Flipenghui6","WDjET7GfxqVQJ8mTEMaRhgpxRdDy18qZkgQDJlwjvbI",{"id":1695,"title":312,"bioSummary":1696,"email":289,"extension":8,"image":1697,"linkedinUrl":289,"meta":1698,"position":1705,"stem":1706,"twitterUrl":289,"__hash__":1707},"authors\u002Fauthors\u002Fhang.md","Hang Chen, an Apache Pulsar and BookKeeper PMC member, is Director of Storage at StreamNative, where he leads the design of next-generation storage architectures and Lakehouse integrations. His work delivers scalable, high-performance infrastructure powering modern cloud-native event streaming platforms.","\u002Fimgs\u002Fauthors\u002Fhang.webp",{"body":1699},{"type":15,"value":1700,"toc":1703},[1701],[48,1702,1696],{},{"title":18,"searchDepth":19,"depth":19,"links":1704},[],"Director of Storage, StreamNative & Apache Pulsar PMC Member","authors\u002Fhang","titaSDxZRJWAW0SkpJSq43NuDvps9XQ6gZIMSPCtUwo",{"id":1709,"title":311,"bioSummary":1710,"email":289,"extension":8,"image":1711,"linkedinUrl":289,"meta":1712,"position":1722,"stem":1723,"twitterUrl":1724,"__hash__":1725},"authors\u002Fauthors\u002Fneng-lu.md","Neng Lu is currently the Director of Platform at StreamNative, where he leads the engineering team in developing the StreamNative ONE Platform and the next-generation Ursa engine. As an Apache Pulsar Committer, he specializes in advancing Pulsar Functions and Pulsar IO Connectors, contributing to the evolution of real-time data streaming technologies. Prior to joining StreamNative, Neng was a Senior Software Engineer at Twitter, where he focused on the Heron project, a cutting-edge real-time computing framework. He holds a Master's degree in Computer Science from the University of California, Los Angeles (UCLA) and a Bachelor's degree from Zhejiang University.","\u002Fimgs\u002Fauthors\u002Fneng-lu.jpeg",{"body":1713},{"type":15,"value":1714,"toc":1720},[1715,1717],[48,1716,1710],{},[48,1718,1719],{},"‍",{"title":18,"searchDepth":19,"depth":19,"links":1721},[],"Director of Engineering, StreamNative","authors\u002Fneng-lu","https:\u002F\u002Ftwitter.com\u002Fnlu90","R1K8DYRoq92ZrwHOmKtJMRfm-cuTjXTqAv0Cc3Q9IM4",[1727,1735,1739],{"path":1728,"title":1729,"date":1730,"image":1731,"link":-1,"collection":1732,"resourceType":1733,"score":1734,"id":1728},"\u002Fblog\u002Fwe-are-a-kafka-company-too","We Are a Kafka Company, Too","2026-04-01","\u002Fimgs\u002Fblogs\u002Fwe-are-a-kafka-company-too-cover.png","blogs","Blog",1,{"path":1568,"title":1736,"date":6,"image":1737,"link":-1,"collection":1732,"resourceType":1733,"score":1738,"id":1568},"Ursa For Kafka: Native Apache Kafka Service on Lakestream","\u002Fimgs\u002Fblogs\u002Fblog-thumbnail-ursa-for-kafka-native-apache-kafka-service-on-lakestream.png",0.8,{"path":1740,"title":1741,"date":1742,"image":1743,"link":-1,"collection":1732,"resourceType":1733,"score":1738,"id":1740},"\u002Fblog\u002Fyou-dont-need-to-shift-everything-left-lakehouse-first-thinking-is-all-you-need","You Don’t Need to Shift Everything Left; Lakehouse-First Thinking is all you need","2025-02-11","\u002Fimgs\u002Fblogs\u002F67ab955476e915d375e54c34_image-78.png",1776749891401]